Author Topic: stop-build does not propagate SIGINT to build script on linux x64  (Read 3269 times)

rowbearto

  • Senior Community Member
  • Posts: 2335
  • Hero Points: 132
Using SE 22 beta1 in Linux x64 on CentOS 7, but this issue also happened in SE 21, is very annoying and would be great if you can fix it in SE 22.

Open attached workspace, stopbuild.vpw
Make active the build configuration "Debug"
Build (this will start a script that prints something every one second)
Execute "stop-build" or "Build->Stop Build"

You will see that the script is not stopped, it keeps going!

Now if you run the script directly in the process buffer, type:

$ ./testbuild

When you execute "stop-build" or press "Ctrl+C", the script actually stops and you will see:

"testbuild got SIGINT" in the output.

But if you do an actual build through SE, it is impossible to stop this script.

It would be really helpful if SE can propagate the SIGINT to the launched script. Due to this issue, I need to find the pid and execute "kill -2 PID" commands outside of SE to stop it, which is very annoying.

rowbearto

  • Senior Community Member
  • Posts: 2335
  • Hero Points: 132
Re: stop-build does not propagate SIGINT to build script on linux x64
« Reply #1 on: August 05, 2017, 08:25:30 PM »
Attached is a more general example where the first script (testbuild) calls a 2nd script (testbuild1) which should also get a SIGINT if SE22 gets modified. Additionally I have added a second build configuration (DebugPty) that runs the script in a pseudo-terminal so that it can have its own process group that can be killed.

Note that according to this page:
https://unix.stackexchange.com/questions/149741/why-is-sigint-not-propagated-to-child-process-when-sent-to-its-parent-process

Quote
When you press CTRL+C, your terminal emulator sends an ETX character (end-of-text / 0x03).
The TTY is configured such that when it receives this character, it sends a SIGINT to the foreground process group of the terminal. This configuration can be viewed by doing stty and looking at intr = ^C;. The POSIX specification says that when INTR is received, it should send a SIGINT to the foreground process group of that terminal.

So SE should send a SIGINT (2) to the entire process group when receiving the Ctrl+C.

But there is a problem here, the process group also contains vs_exe and the shell that invoked vs_exe, and these should not get the SIGINT. For example, when I do a build in SE, we see all the following processes in the same process group:

First, find what process group our script is in:
Code: [Select]
$ ps xao pid,ppid,pgid,sid,cmd | grep testbuild
18138 18125 18125 10505 /home/rbresali/pen/slickedit/se_latest_beta_64/bin/vsbuild -signal 56556 -command /home/rbresali/sebugs/stop_build//testbuild
18139 18138 18139 10505 bash /home/rbresali/sebugs/stop_build/testbuild
18140 18139 18139 10505 bash ./testbuild1
18655 10505 18654 10505 grep --color=auto testbuild

The last numerical column has process group (pgid) of 10505, list all processes in process group 10505:
Code: [Select]
$ ps xao pid,ppid,pgid,sid,cmd | grep 10505   
10505  6417 10505 10505 /bin/ksh
12266     1 12256 10505 sh /home/rbresali/pen/slickedit/se_latest_beta_64/bin/vs -st 0 -sc /home/rbresali/pen/slickedit/config +new
12272 12266 12272 10505 /home/rbresali/pen/slickedit/se_latest_beta_64/bin/vs_exe -st 0 -sc /home/rbresali/pen/slickedit/config +new
18125 12272 18125 10505 /home/rbresali/pen/slickedit/se_latest_beta_64/bin/secsh -i
18138 18125 18125 10505 /home/rbresali/pen/slickedit/se_latest_beta_64/bin/vsbuild -signal 56556 -command /home/rbresali/sebugs/stop_build//testbuild
18139 18138 18139 10505 bash /home/rbresali/sebugs/stop_build/testbuild
18140 18139 18139 10505 bash ./testbuild1
18747 18140 18139 10505 sleep 1
18748 10505 18748 10505 ps xao pid,ppid,pgid,sid,cmd
18749 10505 18748 10505 grep --color=auto 10505

So we see here there is more in the process group that we don't want to kill!

I think the solution is that SE should create a new process group to run the build. But how? It can be done with a pseudo-terminal.

Inspired by this post on stackoverflow:

https://stackoverflow.com/questions/1401002/trick-an-application-into-thinking-its-stdin-is-interactive-not-a-pipe

I found we can run in a pseudo-terminal, and get the exit code of the child process by doing:

Code: [Select]
script -eqfc "<command_to_run>" /dev/null
So in the "DebugPty" build configuration, I have configured the build to do:

Code: [Select]
script -eqfc "%rp/testbuild" /dev/null
Upon running the build after making "DebugPty" the active configuration, we can see that my build scripts (testbuild, testbuild1) are in their own unique process group now:

Code: [Select]
$ ps xao pid,ppid,pgid,sid,cmd | grep testbuild
10288 18125 18125 10505 /home/rbresali/pen/slickedit/se_latest_beta_64/bin/vsbuild -signal 56556 -command script -eqfc /home/rbresali/sebugs/stop_build//testbuild /dev/null
10289 10288 10289 10505 /bin/script -eqfc /home/rbresali/sebugs/stop_build//testbuild /dev/null
10291 10289 10289 10505 /bin/script -eqfc /home/rbresali/sebugs/stop_build//testbuild /dev/null
10292 10291 10292 10292 bash /home/rbresali/sebugs/stop_build//testbuild
10293 10292 10292 10292 bash ./testbuild1
10503 10505 10502 10505 grep --color=auto testbuild

The last column shows they are in process group 10292, grepping for that:

Code: [Select]
$ ps xao pid,ppid,pgid,sid,cmd | grep 10292   
10292 10291 10292 10292 bash /home/rbresali/sebugs/stop_build//testbuild
10293 10292 10292 10292 bash ./testbuild1
11015 10293 10292 10292 sleep 1
11017 10505 11016 10505 grep --color=auto 10292

We can see that only the scripts we want to terminate are in this process group 10292. So if SE can go through this process hierarchy as follows, it can discover the process group:

Get the PID of vsbuild:
Code: [Select]
$ ps xao pid,ppid,pgid,sid,cmd | grep vsbuild
10288 18125 18125 10505 /home/rbresali/pen/slickedit/se_latest_beta_64/bin/vsbuild -signal 56556 -command script -eqfc /home/rbresali/sebugs/stop_build//testbuild /dev/null


PID of vsbuild is 10288, find its children (this can probably be done inside of vsbuild instead of using "ps"):
Code: [Select]
ps xao pid,ppid,pgid,sid,cmd | grep 10288 
10288 18125 18125 10505 /home/rbresali/pen/slickedit/se_latest_beta_64/bin/vsbuild -signal 56556 -command script -eqfc /home/rbresali/sebugs/stop_build//testbuild /dev/null
10289 10288 10289 10505 /bin/script -eqfc /home/rbresali/sebugs/stop_build//testbuild /dev/null

The first "script" child process is at PID 10289. 10289's children are:

Code: [Select]
$ ps xao pid,ppid,pgid,sid,cmd | grep 10289
10289 10288 10289 10505 /bin/script -eqfc /home/rbresali/sebugs/stop_build//testbuild /dev/null
10291 10289 10289 10505 /bin/script -eqfc /home/rbresali/sebugs/stop_build//testbuild /dev/null

Its child is 10291, a 2nd "script" process, then we can find the 2nd script's children, which is our build script:

Code: [Select]
$ ps xao pid,ppid,pgid,sid,cmd | grep 10291
10291 10289 10289 10505 /bin/script -eqfc /home/rbresali/sebugs/stop_build//testbuild /dev/null
10292 10291 10292 10292 bash /home/rbresali/sebugs/stop_build//testbuild

Now we can see our actual build script has PID=10292 and its process group is also 10292, the head of its process group - so safe the kill the entire process group.

So now we can safely send SIGINT to this entire process group (note kill to negative 10292 = -10292, to indicate entire process group):

Code: [Select]
kill -2 -10292
and script with all its children gets properly killed!

So if SE can make sure the build scripts are in their own process group, and then send the SIGINT to the entire process group, this will emulate exactly what happens when one runs the build script at their own terminal and then presses Ctrl-C, which will properly terminate any build script!

So I would highly desire that SE implement this!

Thanks,
Rob
« Last Edit: August 07, 2017, 01:47:56 AM by rowbearto »

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6862
  • Hero Points: 528
Re: stop-build does not propagate SIGINT to build script on linux x64
« Reply #2 on: August 07, 2017, 04:03:20 PM »
Pseudo TTY is great when it works. Most of time is has annoying problems and overall works worse. The +supty invocation option allows the build window to create a pseudo TTY.

We will see if there are any improvements we can make for Ctrl+C without a pseudo TTY.

rowbearto

  • Senior Community Member
  • Posts: 2335
  • Hero Points: 132
Re: stop-build does not propagate SIGINT to build script on linux x64
« Reply #3 on: August 07, 2017, 04:09:39 PM »
Instead of using pseudo-tty, when you fork the new process to run the build script, you can see if it you can assign it a new process group.

Then it would be easy to send the SIGINT to the new process group.

rowbearto

  • Senior Community Member
  • Posts: 2335
  • Hero Points: 132
Re: stop-build does not propagate SIGINT to build script on linux x64
« Reply #4 on: August 07, 2017, 04:16:15 PM »
If vsbuild() is a C/C++ program, you can call setpgid() to have the child get its own process group. Then when vsbuild catches a SIGINT, it can send a new SIGINT to the child process group.

Clark

  • SlickEdit Team Member
  • Senior Community Member
  • *
  • Posts: 6862
  • Hero Points: 528
Re: stop-build does not propagate SIGINT to build script on linux x64
« Reply #5 on: August 08, 2017, 03:46:22 PM »
Thanks for the sample. Definitely helpful as a test case.

You were on the right track except what you are calling the pgid is actually the sid.

The bug is that when vsbuild runs a build command, it's creating a new group id. The "secsh" shell doesn't have that bug. SlickEdit is sending a SIGINT to the secsh process group but vsbuild is messing that up for child processes.

The fix is a bit trickier than that but only because vsbuild is using standard shelling code we have and we have to make sure that any change we make here only effects vsbuild and possible some other places. Sounds simple but it isn't. We should be able to fix this but we have to be very careful.

Thanks for pointing this out. It appeared that Ctrl+C was working but in some cases like your test case it fails.