[Monday, June 18, 2018] [11:26:25 AM PDT] abadger1999: regarding last comment on https://github.com/ansible/ansible/pull/41629 - Using fcntl.lockf() instead of fcntl.flock(), I have no real preference here, I've used both in the past [Monday, June 18, 2018] [11:26:38 AM PDT] fz: K. [Monday, June 18, 2018] [11:27:23 AM PDT] lockf might provide some nice extras being a wrapper [Monday, June 18, 2018] [11:27:36 AM PDT] but I don't know for sure [Monday, June 18, 2018] [11:27:41 AM PDT] I just get that impression from reading a bit (that fcntl.lockf() is more standardized than flock [but I'm not sure if we care about the outliers there], and that flock() has more variance in implementation than lockf) [Monday, June 18, 2018] [11:28:18 AM PDT] neither is perfect, but with locks, perfect is not a thing [Monday, June 18, 2018] [11:29:01 AM PDT] I've only used kernel-level locking once. Most of the time I implement (or use someone else's implemention) of lockfile locking. [Monday, June 18, 2018] [11:29:02 AM PDT] yeah, it's pretty flaky [Monday, June 18, 2018] [11:29:25 AM PDT] Like flufl.lock [Monday, June 18, 2018] [11:29:50 AM PDT] the "only works if everybody plays nice" isn't my favorite [Monday, June 18, 2018] [11:30:01 AM PDT] [Monday, June 18, 2018] [11:30:04 AM PDT] you are mostly OK until you deal with 'special' filesystems, mostly network mounts [Monday, June 18, 2018] [11:30:06 AM PDT] but it's how it's done [Monday, June 18, 2018] [11:30:35 AM PDT] yeah, locks are 'advisory' so we can only guarantee when dealing with 'ourselves' but not 3rd parties [Monday, June 18, 2018] [11:30:38 AM PDT] Yeah, not enough OS's implement mandatory locking (at least, in a sufficiently non-buggy way ;-) [Monday, June 18, 2018] [11:31:10 AM PDT] @abadger1999 i know you've seen me type this before 'locks are the worst possible form of conflict resolution, even when they are hte only one possible' [Monday, June 18, 2018] [11:31:33 AM PDT] ^ that is not even taking implementations into account, as they are a very deep hell i know too well [Monday, June 18, 2018] [11:32:21 AM PDT] I agree with the using context managers, that way the file decriptor can be closed when the operations have completed. Although it will probably complicate some for modules using it since some, atleast lineinfile, read/write in different functions [Monday, June 18, 2018] [11:33:15 AM PDT] actually, you really dont need to worry for that, both lockf/flock will auto release when process dies (really when file descriptor is closed) [Monday, June 18, 2018] [11:33:47 AM PDT] so for modules, which are short lived forks by nature, its ok [Monday, June 18, 2018] [11:34:01 AM PDT] if we were doing locking in core ... that is another story [Monday, June 18, 2018] [11:34:03 AM PDT] bcoca: oh, I thought only the lock was released and the fd was left dangling [Monday, June 18, 2018] [11:34:07 AM PDT] Join dag (~dag@unaffiliated/dag) has joined this channel. [Monday, June 18, 2018] [11:35:18 AM PDT] when the process terminated that is [Monday, June 18, 2018] [11:35:21 AM PDT] fz: I'd make it so you can use it either as a context manager or as a function. [Monday, June 18, 2018] [11:36:43 AM PDT] fz: fd willl all close once process ends, can dangle for a bit, but only if kernel is very busy [Monday, June 18, 2018] [11:36:47 AM PDT] is it ok to put that in basic.py? - sorry if I ask a bunch of obvious stuff here, kind of new to contributing here so I'm not very familiar with aesthetics [Monday, June 18, 2018] [11:37:30 AM PDT] fz: yes and now, it belongs there .. but we are trying to break up basic.py into more digestable chunks [Monday, June 18, 2018] [11:37:38 AM PDT] er 'and no' [Monday, June 18, 2018] [11:38:09 AM PDT] ok, should I start somewhere new then? this being a new feature and all [Monday, June 18, 2018] [11:38:32 AM PDT] @abadger1999 ^ you are the one restructuring that, which file should he target? [Monday, June 18, 2018] [11:38:40 AM PDT] Quit orgito (c8062364@gateway/web/freenode/ip.200.6.35.100) has left this server (Ping timeout: 260 seconds). [Monday, June 18, 2018] [11:39:37 AM PDT] i rather make it future proof(c) if possible :) [Monday, June 18, 2018] [11:40:06 AM PDT] Since this is new functionality, I'd put it in module_utils/common/something.... sivel^ What file did you put file related stuff for the restructing of basic.py? [Monday, June 18, 2018] [11:40:23 AM PDT] also... I can never find it since it's not yet a PR ;-) [Monday, June 18, 2018] [11:41:44 AM PDT] fz: https://github.com/sivel/ansible/blob/categorization-of-basic/lib/ansible/module_utils/common/file.py [Monday, June 18, 2018] [11:41:45 AM PDT] abadger1999: it's commented on your gist [Monday, June 18, 2018] [11:42:32 AM PDT] fz: So use module_utils/common/file.py as the filename. Don't worry about anything else that's in that file in that branch... that's all WIP. [Monday, June 18, 2018] [11:42:55 AM PDT] sivel: Cool. I'll try and remember that for next time. [Monday, June 18, 2018] [11:43:59 AM PDT] the 'simple case' we can solve easily with any of the locks i.e - hosts: all, lineinfile: /etc/hosts, delegate_to: localhost [Monday, June 18, 2018] [11:44:24 AM PDT] Hmm.. .otoh, flock only closes when the actual file descriptor closes whereas fcntl closes when any file descriptor for that file closes.... [Monday, June 18, 2018] [11:44:29 AM PDT] I would just add warnings/notes/red rotating light to inform people about 'locks might not work on certain configurations' [Monday, June 18, 2018] [11:44:55 AM PDT] @abadger1999 each lock implementation has it's own 'niceties' [Monday, June 18, 2018] [11:45:08 AM PDT] why locks + threading == OMFG [Monday, June 18, 2018] [11:45:16 AM PDT] fz: Ah, but you're using a different lockfile to work around that. Nice. [Monday, June 18, 2018] [11:45:19 AM PDT] abadger1999: OK, so branch "categorization-of-basic" on sivels fork? [Monday, June 18, 2018] [11:46:02 AM PDT] fz: You can ignore that mostly... I had to lookup the file path that we want from there. But what you're doing is new code so you don't really need to see it. [Monday, June 18, 2018] [11:46:04 AM PDT] i would still shy away from lockfiles, they tend to have more problems than fnctl/flock/lockf [Monday, June 18, 2018] [11:46:19 AM PDT] fz: Sorry for rambling my thoughts out here... that was unnecessary information. [Monday, June 18, 2018] [11:46:24 AM PDT] specially on cleanup [Monday, June 18, 2018] [11:46:41 AM PDT] @abadger1999 we appologize for rambling now? ... do i just add that to my signature? [Monday, June 18, 2018] [11:47:09 AM PDT] bcoca: fz's PR uses a lockfile. Which gets rid of some of the problems with pure fcntl. But it still uses fcntl so that it gets the cleanup semantics. [Monday, June 18, 2018] [11:47:37 AM PDT] ??? afaik fcntl wont remove lockfiles in all cases [Monday, June 18, 2018] [11:47:58 AM PDT] yes, it was to avoid locks on nfs and other weird stuff [Monday, June 18, 2018] [11:48:08 AM PDT] the problem with lockfils, you cannot remove them on bad exists, kernel locks are removed by kernel ... so that problem does not exist [Monday, June 18, 2018] [11:48:11 AM PDT] pure fcntl if you have fd1 = open('a') ; fcntl.lockf(fd1) ; fd2 = open('b') ; fd2.close(), then the lock is removed. By creating a temporary lockfile, you work around that problem for most cases. [Monday, June 18, 2018] [11:48:25 AM PDT] fcntl drops the lock, it doesn't cleanup the lockfile. [Monday, June 18, 2018] [11:48:34 AM PDT] fz: weird locks with nfs will always happen, even with lockfiles, mostly you rely on being 'lucky' and/or having low concurrency [Monday, June 18, 2018] [11:49:05 AM PDT] Quit sshnaidm|afk (~sshnaidm@5.29.115.205) has left this server (Ping timeout: 248 seconds). [Monday, June 18, 2018] [11:49:20 AM PDT] yes, however if you have /tmp as a NFS mount you might have other problems [Monday, June 18, 2018] [11:49:21 AM PDT] Whereas if you use a lockfile without fcntl/flock, the temporary lockfile that's created can make other code think that something is still actively holding the lock even when the process has ended. [Monday, June 18, 2018] [11:49:21 AM PDT] @abadger1999 old lockfiles tend to create bigger problems than badly coded lock handling .. lock handling we can code correctly, lockfile cleanup is much harder [Monday, June 18, 2018] [11:50:02 AM PDT] bcoca: that's the nice thing about fz's hybrid implementation. It doesn't have the major problems of either pure fcntl or pure lockfile. [Monday, June 18, 2018] [11:50:31 AM PDT] It has some of the anoyances from both but it doesn't have the major problems from either. [Monday, June 18, 2018] [11:50:41 AM PDT] i was going to say .. [Monday, June 18, 2018] [11:51:15 AM PDT] With his implementation, it doesn't matter if the lockfile isn't cleaned up as the lock itself has been released. [Monday, June 18, 2018] [11:51:17 AM PDT] its a tradeoff, but i think kernel locks are still better, the major problem with them is mostly in the code [Monday, June 18, 2018] [11:51:27 AM PDT] (which is the major problem with lockfiles) [Monday, June 18, 2018] [11:51:48 AM PDT] no, lockfiles have no real good solution in your code as you are at the mercy of the kernel [Monday, June 18, 2018] [11:52:29 AM PDT] while kernel locks, you can handle predictably in all cases on th eprocess side, the problem is when the filesystem driver 'lies' .... which is what most network file system do [Monday, June 18, 2018] [11:52:34 AM PDT] but that also affects lockfiles [Monday, June 18, 2018] [11:52:42 AM PDT] Join hdms (~hdms@72.32.180.178) has joined this channel. [Monday, June 18, 2018] [11:53:06 AM PDT] It also means that code which opens and closes the file in multiple places/threads/forks doesnt affect the status of the lock as the lock is actually on the temporary lockfile, not the file which you are wanting to protect (which is the major problem with fcntl) [Monday, June 18, 2018] [11:53:21 AM PDT] Quit lbalhar (~lbalhar@94-74-229-90.client.rionet.cz) has left this server (Ping timeout: 248 seconds). [Monday, June 18, 2018] [11:53:48 AM PDT] its a compounded problem then, it only really solves the 'local cache' which nfs3+ already does by giving local client a link and keeping posix semantics on it .. but not 'really' on the networked file itself [Monday, June 18, 2018] [11:53:56 AM PDT] So yeah, his code is better than either pure fcntl or pure lockfile. [Monday, June 18, 2018] [11:53:59 AM PDT] i belive gluster does similar [Monday, June 18, 2018] [11:54:07 AM PDT] Your off on a different tangent. [Monday, June 18, 2018] [11:54:15 AM PDT] no, im saying it adds more problems and only seems to solve one of the others [Monday, June 18, 2018] [11:54:50 AM PDT] once you deal with more than 1 machine and rely on the network for concurrency ... you are doomed (degree depends on how sync/async the network fs is) [Monday, June 18, 2018] [11:57:49 AM PDT] it does add more problems. But those are minor problems. And it solves the major problem. [Monday, June 18, 2018] [11:57:55 AM PDT] So it's a win. [Monday, June 18, 2018] [11:58:07 AM PDT] There's not two machines and no network in this problem. [Monday, June 18, 2018] [11:58:30 AM PDT] no, it does not solve major problem, it seems to, but it still has race condition and is only safe on 1 case, all edits are in same machine [Monday, June 18, 2018] [11:58:55 AM PDT] abadger1999: if you pose the problem liek that, yes, but once you go to 'real world', you have network mounts an N machines [Monday, June 18, 2018] [11:59:27 AM PDT] and for that 1 case ... there is really not a problem since the kernel lock will handle it eaisly [Monday, June 18, 2018] [11:59:49 AM PDT] you only have a problem once network is involved [Monday, June 18, 2018] [12:00:35 PM PDT] bcoca: Again, [11:54:07] Your off on a different tangent. [Monday, June 18, 2018] [12:00:49 PM PDT] You're talking about the issues that we were talking about earlier in slack. [Monday, June 18, 2018] [12:00:54 PM PDT] This is something different. [Monday, June 18, 2018] [12:01:00 PM PDT] illuminate me how [Monday, June 18, 2018] [12:01:19 PM PDT] afaik this is still about locking file, not sure how the diff cases won't apply [Monday, June 18, 2018] [12:02:12 PM PDT] for single machine/no network, how does lockfile help? what case does it solve that lockf/flock don't cover? [Monday, June 18, 2018] [12:03:27 PM PDT] https://github.com/ansible/ansible/pull/41568 <=@abadger1999 actual tangent, this is ready to merge now, if you want to give final review [Monday, June 18, 2018] [12:09:05 PM PDT] hi, sorry for interruption, but i'm wondering is there any official stance on what format should be used for function/class/method docstrings in modules and module utils? [Monday, June 18, 2018] [12:11:29 PM PDT] Quit tjs (~tjs@2600:1700:f640:38c0:cb8b:dffa:6468:470d) has left this server (Ping timeout: 276 seconds). [Monday, June 18, 2018] [12:12:45 PM PDT] fz if you are going to insist on using lockfiles, use o_sync to open them, that will at least minimize the race condition on certain filesystems [Monday, June 18, 2018] [12:14:11 PM PDT] sync is a lie ... but at least stuff will try to be better about it [Monday, June 18, 2018] [12:15:41 PM PDT] well I don't really insist, but I think the hybrid solution was the "lesser evil" [Monday, June 18, 2018] [12:15:59 PM PDT] bcoca: http://toshio.fedorapeople.org/locking.py [Monday, June 18, 2018] [12:16:06 PM PDT] i just follow rule from fs developer, more files == more problems [Monday, June 18, 2018] [12:16:10 PM PDT] Join tjs (~tjs@2600:1700:f640:38c0:cb8b:dffa:6468:470d) has joined this channel. [Monday, June 18, 2018] [12:16:11 PM PDT] bcoca: There's the problem that the hybrid approach fz is using solves [Monday, June 18, 2018] [12:16:37 PM PDT] abadger1999: not a problem we have in modules [Monday, June 18, 2018] [12:17:00 PM PDT] we dont open lock boundries from core before forking [Monday, June 18, 2018] [12:17:00 PM PDT] bcoca: yes, it is. [Monday, June 18, 2018] [12:17:04 PM PDT] ??? [Monday, June 18, 2018] [12:17:12 PM PDT] bcoca: if it's not, then we have no reason to lock ever. [Monday, June 18, 2018] [12:17:28 PM PDT] The fork is just an easy way to put all of that into one file. [Monday, June 18, 2018] [12:18:00 PM PDT] The same problem exists if you have two entirely separate programs which both want to lock the same file. [Monday, June 18, 2018] [12:18:05 PM PDT] modules dont inherit open files from parent, that example just doesn't happen, modules are more like 2 diff processes on target, unrelated to each other [Monday, June 18, 2018] [12:18:23 PM PDT] abadger1999: the fd will not be the same then, since it is per pid [Monday, June 18, 2018] [12:18:27 PM PDT] bcoca: right. And the problem will happen in the case of 2 different process on the target. [Monday, June 18, 2018] [12:18:32 PM PDT] the problem in your example is that they share fd [Monday, June 18, 2018] [12:18:57 PM PDT] dont open file in parent in your example [Monday, June 18, 2018] [12:19:21 PM PDT] bcoca: nope, it's not. [Monday, June 18, 2018] [12:19:55 PM PDT] bcoca: I just changed to use a different fd o nthe child and uploaded. It has the same buggy^Wunexpected behaviour. [Monday, June 18, 2018] [12:20:15 PM PDT] fd to same file as parent, yes, fd across 2 independant forks, no [Monday, June 18, 2018] [12:20:40 PM PDT] Join nbering (~nbering@23-233-118-141.cpe.pppoe.ca) has joined this channel. [Monday, June 18, 2018] [12:20:55 PM PDT] bcoca: Fine, you write an example and I'll see if you're thinking of hte same scenario as I am. [Monday, June 18, 2018] [12:21:14 PM PDT] AFAICS, this is the documented POSIX behaviour which everyone hates. [Monday, June 18, 2018] [12:21:53 PM PDT] i believe flock doesnt have it .. or is it lockf ? they all have weird corner cases, but lockfile wont fix that [Monday, June 18, 2018] [12:22:07 PM PDT] lockfyle hybrid, as your lock on lockfile has same issue [Monday, June 18, 2018] [12:22:41 PM PDT] bcoca: correct. it is a POSIX API problem. Some flocks have the problem bcause they aren't true BSD semantics, they're implemented on top of POSIX. [Monday, June 18, 2018] [12:23:04 PM PDT] bcoca: systems which have true BSD semantics for flock() should not suffer this behaviour. [Monday, June 18, 2018] [12:23:24 PM PDT] lockfile fixes this. [Monday, June 18, 2018] [12:23:27 PM PDT] if i separate parent and child into separate files and run them with x & y, it doesn't seem to lose the lock [Monday, June 18, 2018] [12:23:33 PM PDT] because the lockfile is temporary. [Monday, June 18, 2018] [12:23:46 PM PDT] ^ crab that is my point [Monday, June 18, 2018] [12:23:55 PM PDT] The code has no reason to do the qeuivalent of the fd2 i nmy example. [Monday, June 18, 2018] [12:24:07 PM PDT] Join sshnaidm|afk (~sshnaidm@89-138-185-43.bb.netvision.net.il) has joined this channel. [Monday, June 18, 2018] [12:25:09 PM PDT] the posix problem is when the fd is shared, which ONLY happens in same process and its children, 2 modules executing on machine wont be in that relationship [Monday, June 18, 2018] [12:26:39 PM PDT] crab: I still have the error condition: [Monday, June 18, 2018] [12:26:41 PM PDT] [pts/5@peru ~]$ ./locking2.py & ./locking3.py & (12:26:01) [Monday, June 18, 2018] [12:26:41 PM PDT] [1] 28752 [Monday, June 18, 2018] [12:26:41 PM PDT] [2] 28753 [Monday, June 18, 2018] [12:26:41 PM PDT] [pts/5@peru ~]$ child started (12:26:04) [Monday, June 18, 2018] [12:26:41 PM PDT] parent started [Monday, June 18, 2018] [12:26:41 PM PDT] child: file is properly locked [Monday, June 18, 2018] [12:26:41 PM PDT] parent continuing [Monday, June 18, 2018] [12:26:41 PM PDT] child continuing [Monday, June 18, 2018] [12:26:41 PM PDT] child: file is no longer locked [Monday, June 18, 2018] [12:26:41 PM PDT] child exiting [Monday, June 18, 2018] [12:26:41 PM PDT] [2] + 28753 done ./locking3.py [Monday, June 18, 2018] [12:26:41 PM PDT] [pts/5@peru ~]$ parent exiting [Monday, June 18, 2018] [12:28:36 PM PDT] bcoca: that is incorrect. closing *any* file descriptor for the file in the process which holds the lock closes the lock which allows *any* other process to then acquire the lock. [Monday, June 18, 2018] [12:29:22 PM PDT] proccesses only share file descripters from parent/child ... new pid gets new fd even if it is same file [Monday, June 18, 2018] [12:29:38 PM PDT] for diff process, unrelated [Monday, June 18, 2018] [12:30:05 PM PDT] file descripters are PER process, so 2 diff utilities cannot have same descriptor [Monday, June 18, 2018] [12:30:34 PM PDT] bcoca: That is correct. But that doesn't have anything to do with flock and fcntl.lockf. If it did, it would make file locking useless. [Monday, June 18, 2018] [12:30:48 PM PDT] no, cause the LOCK is in the kernel [Monday, June 18, 2018] [12:30:56 PM PDT] EXACTLY. [Monday, June 18, 2018] [12:31:22 PM PDT] and deals with the KERNEL ref, but not the process ref, the proccess ref is how the user space knows to tell the kernel, but not how the kernel keeps track [Monday, June 18, 2018] [12:31:30 PM PDT] which is why this problem exists whether or not the two processes share any parentage or not. [Monday, June 18, 2018] [12:31:52 PM PDT] no locking would ever work in that case as any process that finishes would destroy all locks on the file [Monday, June 18, 2018] [12:32:04 PM PDT] the bug relates only to 'child procs' or 'same proc' [Monday, June 18, 2018] [12:32:09 PM PDT] No. [Monday, June 18, 2018] [12:32:14 PM PDT] You don't understand the issue. [Monday, June 18, 2018] [12:32:49 PM PDT] abadger1999: yeah, you're right. [Monday, June 18, 2018] [12:32:54 PM PDT] Thanks :-) [Monday, June 18, 2018] [12:33:49 PM PDT] it was "working" for me because of a boneheaded error. [Monday, June 18, 2018] [12:33:59 PM PDT] once i fixed that, it started breaking. [Monday, June 18, 2018] [12:34:48 PM PDT] so yeah, the fd doesn't have to be shared. closing any fd for the file will release the lock. [Monday, June 18, 2018] [12:35:54 PM PDT] https://gist.github.com/bcoca/e71998e88662bde276eeec45504ac008 <= your example 'never' works for me [Monday, June 18, 2018] [12:36:07 PM PDT] nor fails the same way that you post [Monday, June 18, 2018] [12:38:36 PM PDT] bcoca: What are you running on? [Monday, June 18, 2018] [12:39:03 PM PDT] currently, gentoo 4.16.5 [Monday, June 18, 2018] [12:39:28 PM PDT] type ext4 (rw,noatime,nodiratime,data=writeback) [Monday, June 18, 2018] [12:39:53 PM PDT] python 2.7.14 [Monday, June 18, 2018] [12:42:42 PM PDT] https://irc.toroid.org/paste/050ffba01d781944e956/152935096181291 [Monday, June 18, 2018] [12:42:59 PM PDT] that's from fs/locks.c [Monday, June 18, 2018] [12:43:14 PM PDT] but dup only happens in parent/child [Monday, June 18, 2018] [12:43:21 PM PDT] you dont dup file descriptors across proc [Monday, June 18, 2018] [12:44:18 PM PDT] see 'current process' in what you posted, yes, it breaks for doing multiple opens in single proc or its children but not ACROSS procs, which would be what modules are [Monday, June 18, 2018] [12:44:28 PM PDT] Ahha, that's why youre behaviour seems buggy. [Monday, June 18, 2018] [12:44:58 PM PDT] Your fcntl is no longer able to synchronize writes between a parent and child because of htat. [Monday, June 18, 2018] [12:45:04 PM PDT] the bug only only affect a single badly coded module, not across modules [Monday, June 18, 2018] [12:45:18 PM PDT] *sigh* [Monday, June 18, 2018] [12:45:29 PM PDT] ?? it clearly states' current process' [Monday, June 18, 2018] [12:45:58 PM PDT] Quit vbotka (~vbotka@ip-89-103-146-20.net.upcbroadband.cz) has left this server (Ping timeout: 264 seconds). [Monday, June 18, 2018] [12:48:58 PM PDT] bcoca: Correct. But you are arguing for something else. [Monday, June 18, 2018] [12:49:30 PM PDT] im confused, as my whole point was that it is nto a bug across modules since they are diff procs [Monday, June 18, 2018] [12:49:34 PM PDT] As crab and I proved by running it in two unrelated processes instead of via a fork. [Monday, June 18, 2018] [12:49:46 PM PDT] Yes, you are confused. [Monday, June 18, 2018] [12:50:00 PM PDT] i cannot reproduce even the simple case (probably my system has fix?) [Monday, June 18, 2018] [12:50:08 PM PDT] * bcoca goes in hunt of VM [Monday, June 18, 2018] [12:50:11 PM PDT] No. [Monday, June 18, 2018] [12:50:32 PM PDT] as a slight diversion from this conversation, who wants to merge this very simple/safe PR? https://github.com/ansible/ansible/pull/41665 [Monday, June 18, 2018] [12:50:33 PM PDT] Your system might be different if that comment is from a more recent kernel than me. [Monday, June 18, 2018] [12:51:13 PM PDT] But the behaviour is not showing that there is not a problem. It is showing that you cannot test for the presence of the issue in that way. [Monday, June 18, 2018] [12:51:31 PM PDT] On your system, you apparently *must* have two unrelated processed to provoke the issue. [Monday, June 18, 2018] [12:51:56 PM PDT] i dont see how [Monday, June 18, 2018] [12:52:00 PM PDT] Join stoned75 (~seb@pha75-1-81-57-99-37.fbx.proxad.net) has joined this channel. [Monday, June 18, 2018] [12:52:01 PM PDT] @crab, let tests run, jic [Monday, June 18, 2018] [12:52:14 PM PDT] That comment you found says that the child and parent will share the locks at the time of the fork. Which is why you are seeing "child: file is no longer locked" twice. [Monday, June 18, 2018] [12:53:05 PM PDT] The behaviour most people would want is for it to say "child: file is properly locked" twice [Monday, June 18, 2018] [12:53:10 PM PDT] cause parent already locked [Monday, June 18, 2018] [12:53:12 PM PDT] ? [Monday, June 18, 2018] [12:53:28 PM PDT] The POSIX behaviour with two separate processes should be "child: file is properly locked" followed by "child: file is no longer locked" [Monday, June 18, 2018] [12:53:48 PM PDT] 2 seperate procs, there are no childs [Monday, June 18, 2018] [12:54:05 PM PDT] bcoca: correct. But I'm assuming you don't change the print statements. [Monday, June 18, 2018] [12:54:47 PM PDT] im still trying to figure out why i cannot reproduce your test case ... [Monday, June 18, 2018] [12:54:57 PM PDT] I'm not sure why they would have changed that so that the ocks are inherited by the child... It makes file locking even harder to use when they do that. [Monday, June 18, 2018] [12:55:30 PM PDT] bcoca: check whether the code/comment is the same in older kernels... it feels like the comment you found is a change in behaviour. [Monday, June 18, 2018] [12:55:32 PM PDT] its always been hard with parent/child cause filedescriptors are shared, what makes no sense is this affecting diff procs as FDs are not shared [Monday, June 18, 2018] [12:55:43 PM PDT] crab found [Monday, June 18, 2018] [12:55:44 PM PDT] Hmm.. [Monday, June 18, 2018] [12:56:11 PM PDT] Does "gentoo 4.16.5" mean linux kernel 4.16.5? [Monday, June 18, 2018] [12:56:12 PM PDT] independant procs dont share memory mappings, fd is a form of memory mapping [Monday, June 18, 2018] [12:56:15 PM PDT] yes [Monday, June 18, 2018] [12:56:16 PM PDT] I'm on 4.16.13-200.fc27.x86_64. [Monday, June 18, 2018] [12:56:21 PM PDT] bcoca: I'm using 4.16.11-gentoo python 2.7.14 and I can reproduce [Monday, June 18, 2018] [12:56:28 PM PDT] * bcoca is compiling 4.17.0 ... [Monday, June 18, 2018] [12:56:30 PM PDT] I have different mount opts tho [Monday, June 18, 2018] [12:56:57 PM PDT] issue due to writeback? [Monday, June 18, 2018] [12:57:21 PM PDT] bcoca: are you using a non-glibc libc or something like that? [Monday, June 18, 2018] [12:57:21 PM PDT] the problem with o_sync <= OS/kernel lies to you, passes buck to fs driver <= fs driver lies to u passes buck to device driver <= device driver lies to u, depends on internal write code/cache/network [Monday, June 18, 2018] [12:57:27 PM PDT] bcoca: right. From everything you are saying (which I assume are the things you think are imporant) perhaps the problem is that you are thinking that POSIX file locking behaves more like BSD file locks than they really do. [Monday, June 18, 2018] [12:57:28 PM PDT] ulibc ... [Monday, June 18, 2018] [12:57:35 PM PDT] agaffney: ding ding ding [Monday, June 18, 2018] [12:58:05 PM PDT] abadger1999: depends on the lock mechanism, flock and lockf might use some common fucntions but they dont work teh same [Monday, June 18, 2018] [12:58:10 PM PDT] so...glibc is "broken"? :) [Monday, June 18, 2018] [12:58:14 PM PDT] probably [Monday, June 18, 2018] [12:58:15 PM PDT] bcoca: Exactly. [Monday, June 18, 2018] [12:58:37 PM PDT] This is an issue for lockf and not for flock because of those differences. [Monday, June 18, 2018] [12:59:14 PM PDT] lockf associates a lock with a file. whereas flock associates a lock with a file descriptor. [Monday, June 18, 2018] [12:59:22 PM PDT] but file descriptors are independant from taht, they always work as unique per process space [Monday, June 18, 2018] [12:59:33 PM PDT] So flock can close an unrelated file descriptor and retain the lock i nthat program. [Monday, June 18, 2018] [1:00:02 PM PDT] But with lockf is you close an unrelated file descriptor to the same file as the lock, the ock gets dropped. [Monday, June 18, 2018] [1:01:19 PM PDT] agaffney: ulibc would seem to have the less useful behaviour ;-) I'm not sure if POSIX specifies which of those is supposed to be the behaviour with POSIX file locks and a fork, though. [Monday, June 18, 2018] [1:01:23 PM PDT] if that is so, that is bad lockf implementation on platform, as it is not using descriptor, which is how you prevent 'cross process' problems, I have used it mostly on bsd and can tell you that is not the case there [Monday, June 18, 2018] [1:01:29 PM PDT] lockf [Monday, June 18, 2018] [1:01:36 PM PDT] bcoca: No. that's POSIX behaviour. [Monday, June 18, 2018] [1:01:50 PM PDT] there are lots of things that POSIX doesn't specify that it really should [Monday, June 18, 2018] [1:01:56 PM PDT] POSIX behaviour is an oxymoron, impelmentations differ even when the standard is clear [Monday, June 18, 2018] [1:04:03 PM PDT] https://gavv.github.io/blog/file-locks/#posix-record-locks-fcntl [Monday, June 18, 2018] [1:04:15 PM PDT] all file descriptors opened by the same process for the same file refer to the same lock (even distinct file descriptors, e.g. created using two open() calls) [Monday, June 18, 2018] [1:04:38 PM PDT] http://pubs.opengroup.org/onlinepubs/7908799/xsh/lockf.html [Monday, June 18, 2018] [1:05:04 PM PDT] so.. currently using flock. should I go ahead with this pr and try with my hybrid solution? [Monday, June 18, 2018] [1:05:08 PM PDT] "File locks are released on first close by the locking process of any file descriptor for the file" [Monday, June 18, 2018] [1:05:32 PM PDT] ^ so that seems to ahve been mistaken, 'the locking process' would be one , not multiple diff ones [Monday, June 18, 2018] [1:05:41 PM PDT] This is bcoca " File locks are released on first close by the locking process of any file descriptor for the file. " jinx [Monday, June 18, 2018] [1:06:17 PM PDT] fz: Your hybrid solution will work fine with either flock or lockf. That's why I liek it :-) [Monday, June 18, 2018] [1:06:45 PM PDT] "file locks are released on first close by the locking process of any file descriptor for the file" is exactly what i'm seeing with abadger's code. [Monday, June 18, 2018] [1:06:46 PM PDT] yes, the key is that they refer to a single process opening the file multiple times, not multiple processes, the latter would clearly be a bug [Monday, June 18, 2018] [1:06:54 PM PDT] bcoca: That is the problem in the POSIX standard that having a lockfile + fcntl hybrid works around. [Monday, June 18, 2018] [1:07:13 PM PDT] once the lock is released, another process can acquire it. [Monday, June 18, 2018] [1:07:20 PM PDT] how so, as the lock is unreliable on original as much as on lockfile [Monday, June 18, 2018] [1:07:24 PM PDT] ? [Monday, June 18, 2018] [1:07:31 PM PDT] Yep. and the lock is released when *any* file descriptor i nthe process is closed. [Monday, June 18, 2018] [1:07:40 PM PDT] from standard and how the function describes, its a signle process [Monday, June 18, 2018] [1:08:00 PM PDT] your reading is mistaken. [Monday, June 18, 2018] [1:08:21 PM PDT] it would make sense only if the lock were somehow private to the process. which would make it pretty useless. [Monday, June 18, 2018] [1:08:26 PM PDT] Join DanyC (~DanyC@173.38.220.56) has joined this channel. [Monday, June 18, 2018] [1:08:27 PM PDT] "all file descriptors opened by the same process for the same file refer to the same lock" [Monday, June 18, 2018] [1:08:55 PM PDT] yes. and closing any one of them releases the lock. whereupon another process, ANY process, can acquire it. [Monday, June 18, 2018] [1:08:56 PM PDT] no, FD is private to process, lock in kernel side is kept by process, not globally [Monday, June 18, 2018] [1:09:06 PM PDT] bcoca: You need to write some code to show what you think is working... I don't know what you are arguing about now that you've found the relevant, problematic portion of hte spec. [Monday, June 18, 2018] [1:09:24 PM PDT] so if a process acquires a lock on a file, it locks out only… that process? what? [Monday, June 18, 2018] [1:09:40 PM PDT] im not seeing where the spec says proc1 closing file A breaks lock that proc2 had when they are unrelated [Monday, June 18, 2018] [1:09:50 PM PDT] bcoca: if you remove "no" from your reply, you are on safe ground. But your "no" says that you don't agree with us which means you're wrong. [Monday, June 18, 2018] [1:10:01 PM PDT] but... that is exactly what the spec says. [Monday, June 18, 2018] [1:10:06 PM PDT] bcoca: Correct. and that is not what we've been saying happens. [Monday, June 18, 2018] [1:10:27 PM PDT] if process A acquires a lock and then releases the lock, process B can acquire it. [Monday, June 18, 2018] [1:10:40 PM PDT] and process A can release the lock by closing ANY file descriptor for the locked file. [Monday, June 18, 2018] [1:10:41 PM PDT] We've been saying process one has two fds for the same file. It locks on the first fd. It closes the second fd. Process one no longer holds a lock on the file. [Monday, June 18, 2018] [1:10:42 PM PDT] yes, but process B closing the file does not mean process A looses the lock [Monday, June 18, 2018] [1:10:55 PM PDT] no, but nobody said it did. [Monday, June 18, 2018] [1:11:06 PM PDT] but that would be the case for 2 task executions/modules [Monday, June 18, 2018] [1:11:15 PM PDT] doing it in single process for 2 modules would never happen [Monday, June 18, 2018] [1:11:29 PM PDT] bcoca: no. here's the use case with two modules needing to lock the same file. [Monday, June 18, 2018] [1:11:41 PM PDT] that is my whole point, the bug is meaningless when the parallelism we want to deal with has to do with PARALLEL processes, unrelated to each other [Monday, June 18, 2018] [1:12:07 PM PDT] the relation of processes to each other has nothing to do with the problem that abadger is trying to point out. [Monday, June 18, 2018] [1:12:20 PM PDT] the point is that the locking process must be careful to not inadvertently release the lock. [Monday, June 18, 2018] [1:12:26 PM PDT] but the problem he pointed out has no bearing in modules as they are 2 diff proceses [Monday, June 18, 2018] [1:12:32 PM PDT] not that an unrelated process may acquire the lock by mistake. [Monday, June 18, 2018] [1:12:44 PM PDT] @crab yes, agreed, but that is not solved by a lockfile [Monday, June 18, 2018] [1:13:04 PM PDT] as the 'lock problem' persists with the lockfile, you just have something extra to manage [Monday, June 18, 2018] [1:13:13 PM PDT] process 1 locks the file with fd1. process 2 requests a lock on the file. process 1 opens and closes the file using a separate file descriptor (for instance, calling a library function that uses that file under the hood). process 2 takes the lock and starts writing to the file. process 1 writes to the file because it assumed it still had the lock. [Monday, June 18, 2018] [1:13:50 PM PDT] it is "solved" by a lockfile only in the sense that you can be a little more confident that you're not going to open and close the lockfile inadvertently. [Monday, June 18, 2018] [1:14:00 PM PDT] process1 has bug, that is all you are saying, if a module releases a lock prematurely, its a bug in the module, but that is not fixed by a lockfile as the lock could still be released prematurely there too [Monday, June 18, 2018] [1:14:04 PM PDT] but if you tried, you could certainly do something to break it even if you were using a lockfile. [Monday, June 18, 2018] [1:14:47 PM PDT] * bcoca finds he has lockf calls disabled in kernel .... expains the test case never locking [Monday, June 18, 2018] [1:15:02 PM PDT] wow. how did you even manage to do that? [Monday, June 18, 2018] [1:15:03 PM PDT] The lockfile solves that because this now looks like this: process 1 locks the lockfile associated with the file it wants. process 2 requests a lock on that same lockfile. process 1 opens and closes the file it wants using a separate file descriptor (for isntance, calling a library). process 2 cannot acquire the lock o nthe separate lockfile because the separate lockfile is not the file that was opened. process 1 writes to the [Monday, June 18, 2018] [1:15:03 PM PDT] file and then releases the lock. [Monday, June 18, 2018] [1:15:03 PM PDT] you dirty gentoo ricer [Monday, June 18, 2018] [1:15:21 PM PDT] crab: speedup patch, made it noop, since i dont use things that are very concurrent on my desktop [Monday, June 18, 2018] [1:15:42 PM PDT] abadger1999: proces1 could also close the lockfile prematurely [Monday, June 18, 2018] [1:15:50 PM PDT] don't you do a lot of concurrent node.js dev in your spare time? [Monday, June 18, 2018] [1:15:53 PM PDT] the real fix is dont close files until you are done with them [Monday, June 18, 2018] [1:15:57 PM PDT] bcoca: yes, but that takes conscious effort to do. [Monday, June 18, 2018] [1:16:06 PM PDT] jtanner: i mask so nodejs is never installed [Monday, June 18, 2018] [1:16:29 PM PDT] abadger1999: both cases take concious effort, and the lockfile creates more cleanup problems than it solves for a corner case of bad coding on lockf [Monday, June 18, 2018] [1:17:18 PM PDT] open_with_lock function and a warning to not close the file themsevles should be good enough [Monday, June 18, 2018] [1:17:38 PM PDT] whereas lock('/etc/passwd') os.chown('/etc/passwd', pwd.getpwnam(user)[0]) does not. [Monday, June 18, 2018] [1:18:21 PM PDT] wow, that's a shockingly good example. [Monday, June 18, 2018] [1:19:09 PM PDT] i would shoot people doing that .. but we do worse in user module (/etc/shadow) [Monday, June 18, 2018] [1:19:17 PM PDT] * abadger1999 can't take credit... that's one of the many examples I found on the internet today [Monday, June 18, 2018] [1:23:21 PM PDT] wonder if we should just lock something like the module source file that isn't going to be reopened [Monday, June 18, 2018] [1:24:07 PM PDT] https://github.com/ansible/ansible/blob/devel/lib/ansible/modules/system/user.py#L1564 <= @abadger1999 actual tangent, just reminded me, we use STR() in user module, thinkg it shoudl be to_native? [Monday, June 18, 2018] [1:24:15 PM PDT] it would only lock out concurrent executions of the same module, but that would already be very useful [Monday, June 18, 2018] [1:24:19 PM PDT] it later gets written directly w/o encoding [Monday, June 18, 2018] [1:24:47 PM PDT] crab: not really, even concurrent executions would use diff dirs, but that is something we are looking to optimize [Monday, June 18, 2018] [1:25:10 PM PDT] hosts: all , lineinfile: path=/etc/hosts delegate_to: localhost [Monday, June 18, 2018] [1:25:37 PM PDT] ^ update /etc/hosts with all hosts in my invenotory, in parallel, this is common issue when we hit 'lack of locking' [Monday, June 18, 2018] [1:25:59 PM PDT] yes. [Monday, June 18, 2018] [1:26:24 PM PDT] but that is simple case, eitehr flock/lockf should fix unless someone does something silly [Monday, June 18, 2018] [1:26:29 PM PDT] Quit hdms (~hdms@72.32.180.178) has left this server (Quit: hdms). [Monday, June 18, 2018] [1:26:44 PM PDT] hard case is delegate_to: targetX/Y/Z when they point at nfs share of same file [Monday, June 18, 2018] [1:27:10 PM PDT] or w/o delegate to, hosts: all, -lineinfile: path=/to/network/share/samefile [Monday, June 18, 2018] [1:27:29 PM PDT] ^ at which point proper response is DONTDO THAT [Monday, June 18, 2018] [1:29:43 PM PDT] https://irc.toroid.org/paste/050ffba01d781944e956/152935378348311 [Monday, June 18, 2018] [1:31:03 PM PDT] (that form of invocation of flock(1) acquires the lock on fd 9 instead of a named file)