There is increased usage of the snowflake-01 bridge since yesterday, likely related to protests/shutdowns in Iran. The 4 load-balanced tor instances, which recently were at about 60% CPU at the steady state, are currently near 100%.
I am planning to increase the number of instances today.
I am also seeing a lot of "no address in clientID-to-IP map (capacity 10240)" (dozens per second), so I will increase that parameter at the same time.
After this, I don't think this bridge can easily sustain another doubling in traffic. It would be good to start sharing load with snowflake-02 sooner than later (https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...).
Here's a bridge-extra-info from a couple of days ago. (Remember this represents 1/4 of traffic.) Here `ir` is in about tenth place in users.
https://metrics.torproject.org/collector/recent/bridge-descriptors/extra-inf...
``` @type bridge-extra-info 1.3 extra-info flakey1 5481936581E23D2D178105D44DB6915AB06BFB7F published 2022-09-20 13:10:55 write-history 2022-09-20 03:48:24 (86400 s) 1530049041408,1515073429504,1571942788096,1624372588544,1540386311168 read-history 2022-09-20 03:48:24 (86400 s) 1508810912768,1497901086720,1553884182528,1606369537024,1525348965376 dirreq-stats-end 2022-09-20 05:10:35 (86400 s) dirreq-v3-ips ru=13792,us=1864,cn=1240,de=648,gb=384,??=368,by=368,in=280,fr=272,ir=272,ua=224,ca=168,nl=160,br=144,au=136,es=112,it=112,eg=96,sa=80,mx=72,se=72,ro=64,tr=64,ae=56,ch=56,pl=56,be=48,cz=48,jp=48,za=48,at=40,fi=40,id=40,kz=40,lv=40,il=32,kr=32,ph=32,ar=24,az=24,bd=24,cl=24,dk=24,dz=24,ee=24,hk=24,hu=24,ke=24,lt=24,ng=24,no=24,nz=24,pk=24,pt=24,sg=24,th=24,bg=16,co=16,ec=16,gr=16,ie=16,iq=16,ma=16,md=16,mu=16,my=16,si=16,tn=16,tw=16,uz=16,vn=16,af=8,al=8,am=8,ap=8,ba=8,bf=8,bh=8,bj=8,bn=8,bo=8,bs=8,bz=8,ci=8,cm=8,cr=8,cu=8,cv=8,cw=8,cy=8,do=8,eu=8,ge=8,gh=8,gn=8,gt=8,hn=8,hr=8,ht=8,is=8,jm=8,jo=8,kg=8,kh=8,kw=8,lb=8,lk=8,lu=8,ly=8,mg=8,mk=8,ml=8,mm=8,mt=8,mv=8,na=8,ne=8,np=8,om=8,pa=8,pe=8,pg=8,ps=8,py=8,qa=8,rs=8,rw=8,sc=8,sd=8,sk=8,sn=8,so=8,sv=8,sy=8,tg=8,tj=8,tm=8,tt=8,tz=8,ug=8,uy=8,ve=8,vi=8,ye=8,zm=8,zw=8 dirreq-v3-reqs ru=25256,??=4032,us=3040,cn=2408,de=1008,gb=616,by=608,fr=472,ua=472,ir=448,in=424,nl=304,ca=296,au=256,br=224,it=200,es=192,eg=136,se=128,mx=120,ro=120,sa=112,ae=88,ch=88,jp=88,pl=88,tr=88,cz=80,lv=80,ph=80,be=72,kr=72,za=72,at=64,fi=64,kz=64,id=56,no=56,ar=48,hk=48,il=40,lb=40,lt=40,pk=40,az=32,bd=32,cl=32,co=32,dk=32,hu=32,ke=32,nz=32,pt=32,sg=32,bg=24,dz=24,ec=24,ee=24,gr=24,mu=24,my=24,ng=24,si=24,so=24,th=24,tw=24,uz=24,am=16,ba=16,bz=16,cm=16,cr=16,hr=16,ie=16,iq=16,jo=16,lk=16,lu=16,ma=16,md=16,mm=16,rs=16,sk=16,tn=16,vn=16,af=8,al=8,ap=8,bf=8,bh=8,bj=8,bn=8,bo=8,bs=8,ci=8,cu=8,cv=8,cw=8,cy=8,do=8,eu=8,ge=8,gh=8,gn=8,gt=8,hn=8,ht=8,is=8,jm=8,kg=8,kh=8,kw=8,ly=8,mg=8,mk=8,ml=8,mt=8,mv=8,na=8,ne=8,np=8,om=8,pa=8,pe=8,pg=8,ps=8,py=8,qa=8,rw=8,sc=8,sd=8,sn=8,sv=8,sy=8,tg=8,tj=8,tm=8,tt=8,tz=8,ug=8,uy=8,ve=8,vi=8,ye=8,zm=8,zw=8 bridge-stats-end 2022-09-20 05:10:44 (86400 s) bridge-ips ru=18816,us=2984,cn=1720,de=1024,gb=640,by=576,??=496,in=488,fr=400,ir=400,ua=280,br=256,nl=256,ca=232,au=192,it=168,eg=160,es=152,sa=128,pl=112,tr=112,mx=104,se=104,ch=96,ro=96,ae=88,jp=80,at=72,be=72,cz=72,id=72,fi=64,kz=56,ph=56,za=56,kr=48,lv=48,az=40,dk=40,il=40,ng=40,no=40,pk=40,ar=32,cl=32,dz=32,gr=32,hk=32,hu=32,ke=32,lt=32,nz=32,pt=32,th=32,bd=24,bg=24,co=24,ee=24,ie=24,ma=24,mu=24,my=24,sg=24,tw=24,uz=24,vn=24,am=16,cr=16,ec=16,hr=16,iq=16,jo=16,kw=16,lk=16,lu=16,md=16,pe=16,py=16,rs=16,sd=16,si=16,sk=16,tm=16,tn=16,ug=16,ve=16,af=8,al=8,ap=8,ba=8,bf=8,bh=8,bj=8,bn=8,bo=8,bs=8,bw=8,ci=8,cm=8,cu=8,cv=8,cw=8,cy=8,do=8,et=8,eu=8,ge=8,gh=8,gn=8,gp=8,gt=8,hn=8,ht=8,is=8,jm=8,kg=8,kh=8,ky=8,lb=8,li=8,ly=8,mf=8,mg=8,mk=8,ml=8,mm=8,mq=8,mt=8,mv=8,na=8,ne=8,ni=8,np=8,om=8,pa=8,pg=8,ps=8,qa=8,re=8,rw=8,sc=8,sn=8,so=8,sv=8,sy=8,tg=8,tj=8,tt=8,tz=8,uy=8,vi=8,ye=8,zm=8,zw=8 ```
Here's another bridge-extra-info from about nine hours ago. Now `ir` is in second, third, or fourth place, depending on the metric.
``` @type bridge-extra-info 1.3 extra-info flakey1 5481936581E23D2D178105D44DB6915AB06BFB7F published 2022-09-22 13:18:22 write-history 2022-09-22 03:48:24 (86400 s) 1571942788096,1624372588544,1540386311168,1503769995264,1541499914240 read-history 2022-09-22 03:48:24 (86400 s) 1553884182528,1606369537024,1525348965376,1487172212736,1519574591488 dirreq-stats-end 2022-09-22 05:10:35 (86400 s) dirreq-v3-ips ru=13656,us=2120,ir=2056,cn=1384,de=616,by=432,gb=392,??=384,fr=288,in=264,ua=256,ca=168,nl=168,br=152,au=144,es=112,eg=104,it=104,sa=80,pl=72,se=72,tr=72,ae=64,ch=64,mx=64,ro=64,be=48,cz=48,fi=48,jp=48,za=48,at=40,id=40,kz=40,az=32,cl=32,dk=32,il=32,ke=32,kr=32,lt=32,lv=32,ma=32,ng=32,nz=32,pt=32,co=24,ee=24,gr=24,hk=24,hu=24,mu=24,no=24,ph=24,pk=24,tn=24,ar=16,bd=16,bg=16,do=16,dz=16,ge=16,hr=16,ie=16,iq=16,md=16,my=16,rs=16,sg=16,sk=16,th=16,tw=16,ug=16,uz=16,am=8,ap=8,ba=8,bf=8,bh=8,bj=8,bn=8,bo=8,bw=8,ci=8,cm=8,cr=8,cu=8,cv=8,cw=8,cy=8,ec=8,eu=8,ga=8,gh=8,gt=8,gy=8,hn=8,is=8,jm=8,jo=8,kg=8,kh=8,kw=8,ky=8,lb=8,lk=8,lu=8,ly=8,me=8,mk=8,ml=8,mm=8,mo=8,mv=8,mw=8,na=8,ne=8,ni=8,np=8,om=8,pe=8,pr=8,ps=8,py=8,qa=8,re=8,rw=8,sc=8,sd=8,si=8,sn=8,so=8,ss=8,sy=8,tg=8,tm=8,tt=8,tz=8,ve=8,vi=8,vn=8,ye=8,zm=8,zw=8 dirreq-v3-reqs ru=26032,??=4128,us=3408,ir=2792,cn=2784,de=960,by=664,gb=648,ua=592,fr=512,in=360,ca=280,au=272,nl=264,br=248,es=208,it=184,eg=176,mx=152,ae=144,pl=136,ch=128,tr=120,ro=112,se=112,jp=96,sa=96,fi=88,za=80,at=72,be=72,cz=72,il=64,kr=64,kz=64,az=56,dk=56,hu=56,id=56,lt=56,rs=56,hk=48,ke=48,lv=48,cl=40,co=40,ng=40,no=40,nz=40,ph=40,pk=40,pt=40,ee=32,ge=32,gr=32,ma=32,md=32,mu=32,sg=32,tn=32,am=24,ar=24,bd=24,bg=24,dz=24,ie=24,my=24,sk=24,th=24,tw=24,uz=24,cr=16,cy=16,do=16,ec=16,eu=16,hr=16,iq=16,lk=16,ly=16,mm=16,pe=16,si=16,ug=16,vn=16,ap=8,ba=8,bf=8,bh=8,bj=8,bn=8,bo=8,bw=8,ci=8,cm=8,cu=8,cv=8,cw=8,ga=8,gh=8,gt=8,gy=8,hn=8,is=8,jm=8,jo=8,kg=8,kh=8,kw=8,ky=8,lb=8,lu=8,me=8,mk=8,ml=8,mo=8,mv=8,mw=8,na=8,ne=8,ni=8,np=8,om=8,pr=8,ps=8,py=8,qa=8,re=8,rw=8,sc=8,sd=8,sn=8,so=8,ss=8,sy=8,tg=8,tm=8,tt=8,tz=8,ve=8,vi=8,ye=8,zm=8,zw=8 bridge-stats-end 2022-09-22 05:10:44 (86400 s) bridge-ips ru=18440,ir=5200,us=3624,cn=1880,de=984,gb=656,by=648,??=520,in=472,fr=416,ua=320,nl=264,br=240,ca=232,au=216,eg=176,it=160,es=152,sa=136,pl=128,se=120,tr=120,ae=104,ch=104,mx=104,ro=96,be=80,jp=80,cz=72,fi=72,id=72,mu=72,za=72,at=64,ng=64,kz=56,ma=56,tn=56,hk=48,il=48,pk=48,az=40,cl=40,co=40,dk=40,dz=40,ke=40,kr=40,lt=40,lv=40,no=40,nz=40,pt=40,ar=32,gr=32,ie=32,my=32,ph=32,sg=32,bd=24,bg=24,ee=24,hu=24,iq=24,rs=24,th=24,tw=24,vn=24,zm=24,cr=16,cy=16,do=16,ec=16,ge=16,gt=16,hr=16,kw=16,lk=16,md=16,mm=16,om=16,pe=16,sd=16,sk=16,tz=16,ug=16,uz=16,ve=16,af=8,al=8,am=8,ao=8,ap=8,ba=8,bf=8,bh=8,bj=8,bn=8,bo=8,bw=8,bz=8,ci=8,cm=8,cu=8,cv=8,cw=8,eu=8,ga=8,gh=8,gn=8,gu=8,gy=8,hn=8,is=8,jm=8,jo=8,kg=8,kh=8,ky=8,lb=8,lu=8,ly=8,me=8,mk=8,ml=8,mo=8,mt=8,mv=8,mw=8,na=8,ne=8,ni=8,np=8,pg=8,pr=8,ps=8,py=8,qa=8,re=8,rw=8,sc=8,si=8,sn=8,so=8,sr=8,ss=8,sv=8,sy=8,sz=8,tg=8,tj=8,tm=8,tt=8,uy=8,vi=8,ye=8,zw=8 ```
On Thu, Sep 22, 2022 at 09:24:47AM -0600, David Fifield wrote:
There is increased usage of the snowflake-01 bridge since yesterday, likely related to protests/shutdowns in Iran. The 4 load-balanced tor instances, which recently were at about 60% CPU at the steady state, are currently near 100%.
I am planning to increase the number of instances today.
I increased the number of instances without incident: https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...
I am also seeing a lot of "no address in clientID-to-IP map (capacity 10240)" (dozens per second), so I will increase that parameter at the same time.
This is among some performance changes that I hope to deploy tomorrow. I've actually deployed them on the snowflake-02 bridge for testing already. https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...
https://metrics.torproject.org/collector/recent/bridge-descriptors/extra-inf...
@type bridge-extra-info 1.3 extra-info flakey1 5481936581E23D2D178105D44DB6915AB06BFB7F published 2022-09-22 13:18:22 write-history 2022-09-22 03:48:24 (86400 s) 1571942788096,1624372588544,1540386311168,1503769995264,1541499914240 read-history 2022-09-22 03:48:24 (86400 s) 1553884182528,1606369537024,1525348965376,1487172212736,1519574591488 dirreq-stats-end 2022-09-22 05:10:35 (86400 s) dirreq-v3-ips ru=13656,us=2120,ir=2056,cn=1384,de=616,by=432,gb=392,??=384,fr=288,in=264,ua=256,ca=168,nl=168,br=152,au=144,es=112,eg=104,it=104,sa=80,pl=72,se=72,tr=72,ae=64,ch=64,mx=64,ro=64,be=48,cz=48,fi=48,jp=48,za=48,at=40,id=40,kz=40,az=32,cl=32,dk=32,il=32,ke=32,kr=32,lt=32,lv=32,ma=32,ng=32,nz=32,pt=32,co=24,ee=24,gr=24,hk=24,hu=24,mu=24,no=24,ph=24,pk=24,tn=24,ar=16,bd=16,bg=16,do=16,dz=16,ge=16,hr=16,ie=16,iq=16,md=16,my=16,rs=16,sg=16,sk=16,th=16,tw=16,ug=16,uz=16,am=8,ap=8,ba=8,bf=8,bh=8,bj=8,bn=8,bo=8,bw=8,ci=8,cm=8,cr=8,cu=8,cv=8,cw=8,cy=8,ec=8,eu=8,ga=8,gh=8,gt=8,gy=8,hn=8,is=8,jm=8,jo=8,kg=8,kh=8,kw=8,ky=8,lb=8,lk=8,lu=8,ly=8,me=8,mk=8,ml=8,mm=8,mo=8,mv=8,mw=8,na=8,ne=8,ni=8,np=8,om=8,pe=8,pr=8,ps=8,py=8,qa=8,re=8,rw=8,sc=8,sd=8,si=8,sn=8,so=8,ss=8,sy=8,tg=8,tm=8,tt=8,tz=8,ve=8,vi=8,vn=8,ye=8,zm=8,zw=8 dirreq-v3-reqs ru=26032,??=4128,us=3408,ir=2792,cn=2784,de=960,by=664,gb=648,ua=592,fr=512,in=360,ca=280,au=272,nl=264,br=248,es=208,it=184,eg=176,mx=152,ae=144,pl=136,ch=128,tr=120,ro=112,se=112,jp=96,sa=96,fi=88,za=80,at=72,be=72,cz=72,il=64,kr=64,kz=64,az=56,dk=56,hu=56,id=56,lt=56,rs=56,hk=48,ke=48,lv=48,cl=40,co=40,ng=40,no=40,nz=40,ph=40,pk=40,pt=40,ee=32,ge=32,gr=32,ma=32,md=32,mu=32,sg=32,tn=32,am=24,ar=24,bd=24,bg=24,dz=24,ie=24,my=24,sk=24,th=24,tw=24,uz=24,cr=16,cy=16,do=16,ec=16,eu=16,hr=16,iq=16,lk=16,ly=16,mm=16,pe=16,si=16,ug=16,vn=16,ap=8,ba=8,bf=8,bh=8,bj=8,bn=8,bo=8,bw=8,ci=8,cm=8,cu=8,cv=8,cw=8,ga=8,gh=8,gt=8,gy=8,hn=8,is=8,jm=8,jo=8,kg=8,kh=8,kw=8,ky=8,lb=8,lu=8,me=8,mk=8,ml=8,mo=8,mv=8,mw=8,na=8,ne=8,ni=8,np=8,om=8,pr=8,ps=8,py=8,qa=8,re=8,rw=8,sc=8,sd=8,sn=8,so=8,ss=8,sy=8,tg=8,tm=8,tt=8,tz=8,ve=8,vi=8,ye=8,zm=8,zw=8 bridge-stats-end 2022-09-22 05:10:44 (86400 s) bridge-ips ru=18440,ir=5200,us=3624,cn=1880,de=984,gb=656,by=648,??=520,in=472,fr=416,ua=320,nl=264,br=240,ca=232,au=216,eg=176,it=160,es=152,sa=136,pl=128,se=120,tr=120,ae=104,ch=104,mx=104,ro=96,be=80,jp=80,cz=72,fi=72,id=72,mu=72,za=72,at=64,ng=64,kz=56,ma=56,tn=56,hk=48,il=48,pk=48,az=40,cl=40,co=40,dk=40,dz=40,ke=40,kr=40,lt=40,lv=40,no=40,nz=40,pt=40,ar=32,gr=32,ie=32,my=32,ph=32,sg=32,bd=24,bg=24,ee=24,hu=24,iq=24,rs=24,th=24,tw=24,vn=24,zm=24,cr=16,cy=16,do=16,ec=16,ge=16,gt=16,hr=16,kw=16,lk=16,md=16,mm=16,om=16,pe=16,sd=16,sk=16,tz=16,ug=16,uz=16,ve=16,af=8,al=8,am=8,ao=8,ap=8,ba=8,bf=8,bh=8,bj=8,bn=8,bo=8,bw=8,bz=8,ci=8,cm=8,cu=8,cv=8,cw=8,eu=8,ga=8,gh=8,gn=8,gu=8,gy=8,hn=8,is=8,jm=8,jo=8,kg=8,kh=8,ky=8,lb=8,lu=8,ly=8,me=8,mk=8,ml=8,mo=8,mt=8,mv=8,mw=8,na=8,ne=8,ni=8,np=8,pg=8,pr=8,ps=8,py=8,qa=8,re=8,rw=8,sc=8,si=8,sn=8,so=8,sr=8,ss=8,sv=8,sy=8,sz=8,tg=8,tj=8,tm=8,tt=8,uy=8,vi=8,ye=8,zw=8
One day later, ir has jumped way out in front.
``` dirreq-stats-end 2022-09-23 05:10:35 (86400 s) dirreq-v3-ips ir=36472,ru=12088,us=5584,cn=1184,de=720,by=448,gb=432,mu=384,??=352,tn=336,fr=328,in=240,nl=224,eg=216,ua=216,ca=160,ma=160,ng=160,br=144,za=144,au=136,it=120,es=104,tr=96,zm=96,ae=88,jp=80,ci=64,pl=64,ro=64,se=64,ch=56,ke=56,mx=56,sa=56,ug=56,fi=48,sd=48,be=40,bf=40,cz=40,hk=40,at=32,dk=32,id=32,il=32,kr=32,kz=32,lv=32,ph=32,pk=32,ar=24,az=24,bd=24,cl=24,gr=24,hu=24,lt=24,mw=24,no=24,nz=24,pt=24,sg=24,sk=24,am=16,bg=16,co=16,cr=16,dz=16,ee=16,ga=16,ge=16,gh=16,ie=16,iq=16,kw=16,md=16,mg=16,my=16,rs=16,sl=16,th=16,tw=16,uz=16,vn=16,af=8,al=8,ao=8,ap=8,aw=8,ba=8,bb=8,bh=8,bi=8,bj=8,bo=8,bw=8,cg=8,cm=8,cu=8,cw=8,cy=8,do=8,ec=8,eu=8,fj=8,gm=8,gq=8,gt=8,gu=8,hn=8,hr=8,is=8,jm=8,jo=8,kg=8,kh=8,ky=8,la=8,lb=8,lk=8,lu=8,ly=8,mk=8,ml=8,mm=8,mt=8,mv=8,mz=8,na=8,ni=8,om=8,pa=8,pe=8,ps=8,py=8,qa=8,re=8,rw=8,sb=8,sc=8,si=8,sn=8,so=8,sr=8,sv=8,sy=8,tg=8,tj=8,tm=8,tt=8,uy=8,ve=8,ye=8,zw=8 dirreq-v3-reqs ir=51832,ru=21040,??=8760,us=7800,cn=2136,de=1136,by=704,gb=664,fr=576,mu=496,tn=480,nl=376,ua=376,in=328,eg=312,ca=296,au=248,ng=240,br=232,ma=216,za=208,es=176,it=176,zm=136,ae=128,jp=128,tr=128,pl=104,ch=96,mx=96,ro=96,ci=88,se=88,ke=80,sa=80,ug=80,cz=72,fi=72,be=64,kr=64,lv=64,sd=64,at=56,bf=56,hk=56,lb=56,ph=56,pk=56,ar=48,id=48,iq=48,az=40,bd=40,dk=40,il=40,kz=40,nz=40,sg=40,cl=32,co=32,hu=32,lt=32,no=32,sk=32,uz=32,am=24,cr=24,ee=24,ga=24,ge=24,gh=24,gr=24,ie=24,md=24,mw=24,my=24,pt=24,rs=24,sl=24,ap=16,bg=16,dz=16,ec=16,eu=16,hr=16,kw=16,lu=16,ly=16,mg=16,th=16,tw=16,vn=16,af=8,al=8,ao=8,aw=8,ba=8,bb=8,bh=8,bi=8,bj=8,bo=8,bw=8,cg=8,cm=8,cu=8,cw=8,cy=8,do=8,fj=8,gm=8,gq=8,gt=8,gu=8,hn=8,is=8,jm=8,jo=8,kg=8,kh=8,ky=8,la=8,lk=8,mk=8,ml=8,mm=8,mt=8,mv=8,mz=8,na=8,ni=8,om=8,pa=8,pe=8,ps=8,py=8,qa=8,re=8,rw=8,sb=8,sc=8,si=8,sn=8,so=8,sr=8,sv=8,sy=8,tg=8,tj=8,tm=8,tt=8,uy=8,ve=8,ye=8,zw=8 bridge-stats-end 2022-09-23 05:10:44 (86400 s) bridge-ips ir=79192,ru=16840,us=12592,cn=1696,de=1320,mu=968,tn=768,gb=744,by=720,fr=600,??=504,eg=472,ma=416,in=408,nl=392,ng=368,za=304,ua=288,ca=272,au=224,br=208,zm=192,it=176,tr=176,ae=152,ci=152,es=152,jp=144,ro=136,se=128,ug=128,ke=120,sd=112,pl=104,sa=104,ch=96,fi=80,mx=72,be=64,bf=64,cz=64,hk=64,id=64,at=56,kr=56,kz=56,ph=56,pk=56,dk=48,ga=48,lv=48,ar=40,az=40,il=40,no=40,sg=40,bd=32,bg=32,cl=32,co=32,ge=32,gh=32,gr=32,hu=32,lt=32,mw=32,my=32,nz=32,pt=32,sk=32,am=24,ap=24,ee=24,ie=24,iq=24,kw=24,mg=24,rs=24,sl=24,th=24,tw=24,vn=24,af=16,cg=16,cm=16,cr=16,dz=16,ec=16,eu=16,gm=16,hr=16,lu=16,ly=16,md=16,mm=16,om=16,pe=16,sn=16,tg=16,uz=16,al=8,ao=8,aw=8,ba=8,bb=8,bh=8,bi=8,bj=8,bo=8,bs=8,bw=8,cd=8,cu=8,cw=8,cy=8,do=8,et=8,fj=8,gq=8,gt=8,gu=8,hn=8,ht=8,is=8,jo=8,kg=8,kh=8,ky=8,la=8,lb=8,lk=8,mk=8,ml=8,mt=8,mv=8,mz=8,na=8,nc=8,ni=8,np=8,pa=8,pf=8,pg=8,ps=8,py=8,qa=8,re=8,rw=8,sb=8,sc=8,si=8,so=8,sr=8,sv=8,sy=8,sz=8,tj=8,tm=8,tt=8,tz=8,uy=8,ve=8,ye=8,zw=8 ```
On Thu, Sep 22, 2022 at 11:25:54PM -0600, David Fifield wrote:
On Thu, Sep 22, 2022 at 09:24:47AM -0600, David Fifield wrote:
There is increased usage of the snowflake-01 bridge since yesterday, likely related to protests/shutdowns in Iran. The 4 load-balanced tor instances, which recently were at about 60% CPU at the steady state, are currently near 100%.
I am planning to increase the number of instances today.
I increased the number of instances without incident: https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...
I increased the number of instances again, from 8 to 12. https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...
I am also seeing a lot of "no address in clientID-to-IP map (capacity 10240)" (dozens per second), so I will increase that parameter at the same time.
This is among some performance changes that I hope to deploy tomorrow. I've actually deployed them on the snowflake-02 bridge for testing already. https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...
I deployed more optimizations aimed at decreasing memory usage per client. https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...
I attached a graph of interface bandwidth for the past few days. Outgoing bandwidth reached well over 300 MB/s on September 24. At this moment, traffic is approaching the daily minimum, which is still around 200 MB/s. We'll see what tomorrow brings. If we run into more memory pressure, we have another easy mitigation, which is to decrease the size of client send queues. https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla... For CPU pressure, I don't see any quick fixes. In an emergency, we could hack the tor binary to use a static ExtORPort authentication cookie, and remove the extor-static-cookie shim from the pipeline.
David Fifield david@bamsoftware.com wrote Sat, 24 Sep 2022 20:14:18 -0600:
I deployed more optimizations aimed at decreasing memory usage per client. https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...
FWIW, more RAM is on its way. I hope to install another 48GB today or tomorrow European time, doubling the amount of RAM in the system. Expect downtime while this is being done. I will open a ticket as soon as I know more about the timing.
I attached a graph of interface bandwidth for the past few days. Outgoing bandwidth reached well over 300 MB/s on September 24. At this moment, traffic is approaching the daily minimum, which is still around 200 MB/s. We'll see what tomorrow brings. If we run into more memory pressure, we have another easy mitigation, which is to decrease the size of client send queues. https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla... CPU pressure, I don't see any quick fixes. In an emergency, we could hack the tor binary to use a static ExtORPort authentication cookie, and remove the extor-static-cookie shim from the pipeline.
That might be useful also for bringing down the number of context switches in the system. Not sure that's a problem though, will investigate.
Possibly related, the number of packets per second seems to be bound to 400 kpps so something is going full at a nice and round figure here:
More granularity shows symmetric "dips" for in- and outbound with a frequency of about 10-15 seconds:
CPU interrupts, softirq and softnet show matching dips:
It seems likely that we're hitting a limit of some sort and next thing is to figure out if it's a soft limit that we can influence through system configuration or if it's a hardware resource limit.
On Mon, Sep 26, 2022 at 10:39:42AM +0200, Linus Nordberg via anti-censorship-team wrote:
Possibly related, the number of packets per second seems to be bound to 400 kpps so something is going full at a nice and round figure here:
More granularity shows symmetric "dips" for in- and outbound with a frequency of about 10-15 seconds:
CPU interrupts, softirq and softnet show matching dips:
I see the packets/s exceeding 400k now. But you're right, the daily peaks of the packets/s graph look unnaturally flattened the past few days, see attached graph. (Iran shutdowns marked in red.)
I don't know what those semi-periodic "dips" could be. Maybe garbage collection in the snowflake-server process?
David Fifield david@bamsoftware.com wrote Mon, 26 Sep 2022 11:04:15 -0600:
On Mon, Sep 26, 2022 at 10:39:42AM +0200, Linus Nordberg via anti-censorship-team wrote:
Possibly related, the number of packets per second seems to be bound to 400 kpps so something is going full at a nice and round figure here:
More granularity shows symmetric "dips" for in- and outbound with a frequency of about 10-15 seconds:
CPU interrupts, softirq and softnet show matching dips:
I see the packets/s exceeding 400k now. But you're right, the daily peaks of the packets/s graph look unnaturally flattened the past few days, see attached graph. (Iran shutdowns marked in red.)
I don't know what those semi-periodic "dips" could be. Maybe garbage collection in the snowflake-server process?
We have other similar patterns of TCP RST (from us) which I haven't investigated if they correlate or not. If someone would like to look at these I'm happy to help getting the data out. For those with a shell on the box, pointing a browser at localhost:19999 is sufficient.
On Mon, Sep 26, 2022 at 10:39:42AM +0200, Linus Nordberg via anti-censorship-team wrote:
It seems likely that we're hitting a limit of some sort and next thing is to figure out if it's a soft limit that we can influence through system configuration or if it's a hardware resource limit.
tor has a default bandwidth limit, but we should be nowhere close to it, especially disitributed across 12 instances:
BandwidthRate N bytes|KBytes|MBytes|GBytes|TBytes|KBits|MBits|GBits|TBits A token bucket limits the average incoming bandwidth usage on this node to the specified number of bytes per second, and the average outgoing bandwidth usage to that same value. If you want to run a relay in the public network, this needs to be at the very least 75 KBytes for a relay (that is, 600 kbits) or 50 KBytes for a bridge (400 kbits) — but of course, more is better; we recommend at least 250 KBytes (2 mbits) if possible. (Default: 1 GByte)
I do not see any rate limit enabled in /etc/haproxy/haproxy.cfg.
I checked the number of sockets connected to the haproxy frontend port, thinking that we may be running out of localhost 4-tuples. It's still in bounds (but we may have to figure something out for that eventually).
# ss -n | grep -c '127.0.0.1:10000\s*$' 27314 # sysctl net.ipv4.ip_local_port_range net.ipv4.ip_local_port_range = 15000 64000
According to https://stackoverflow.com/a/3923785, some other parameters that may be important are
# sysctl net.ipv4.tcp_fin_timeout net.ipv4.tcp_fin_timeout = 60 # cat /proc/sys/net/netfilter/nf_conntrack_max 262144 # sysctl net.core.netdev_max_backlog net.core.netdev_max_backlog = 1000 Ethernet txqueuelen (1000)
net.core.netdev_max_backlog is the "maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them." https://www.kernel.org/doc/html/latest/admin-guide/sysctl/net.html#netdev-ma... But if we were having trouble with backlog buffer sizes, I would expect to see lots of dropped packets, and I don't:
# ethtool -S eno1 | grep dropped rx_dropped: 0 tx_dropped: 0
It may be something inside snowflake-server, for example some central scheduling algorithm that cannot run any faster. (Though if that were the case, I'd expect to see one CPU core at 100%, which I do not.) I suggest doing another round of profiling now that we have taken care of the more obvious hotspots in https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
David Fifield david@bamsoftware.com wrote Tue, 27 Sep 2022 08:54:53 -0600:
On Mon, Sep 26, 2022 at 10:39:42AM +0200, Linus Nordberg via anti-censorship-team wrote:
It seems likely that we're hitting a limit of some sort and next thing is to figure out if it's a soft limit that we can influence through system configuration or if it's a hardware resource limit.
tor has a default bandwidth limit, but we should be nowhere close to it, especially disitributed across 12 instances:
BandwidthRate N bytes|KBytes|MBytes|GBytes|TBytes|KBits|MBits|GBits|TBits
A token bucket limits the average incoming bandwidth usage on this node to the specified number of bytes per second, and the average outgoing bandwidth usage to that same value. If you want to run a relay in the public network, this needs to be at the very least 75 KBytes for a relay (that is, 600 kbits) or 50 KBytes for a bridge (400 kbits) — but of course, more is better; we recommend at least 250 KBytes (2 mbits) if possible. (Default: 1 GByte)
I do not see any rate limit enabled in /etc/haproxy/haproxy.cfg.
I checked the number of sockets connected to the haproxy frontend port, thinking that we may be running out of localhost 4-tuples. It's still in bounds (but we may have to figure something out for that eventually).
# ss -n | grep -c '127.0.0.1:10000\s*$' 27314 # sysctl net.ipv4.ip_local_port_range net.ipv4.ip_local_port_range = 15000 64000
Would more IP addresses and DNS round robin work?
According to https://stackoverflow.com/a/3923785, some other parameters that may be important are
# sysctl net.ipv4.tcp_fin_timeout net.ipv4.tcp_fin_timeout = 60 # cat /proc/sys/net/netfilter/nf_conntrack_max 262144
Yes, we'd better keep an eye on the conntrack count and either raise the max or get rid of the connection tracking somehow. I've seen warnings from netdata about the count raising above 85%.
# cat /proc/sys/net/netfilter/nf_conntrack_{count,max} 181053 262144
One thing I would like to do soon is to hook up the other NIC and put sshd and wireguard on that while keeping snowflake traffic on the current 10G. That way we could start playing with ethtool to instruct the NIC to do some fancy stuff suggested by anarcat (see below).
I've created https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf... to track this.
# sysctl net.core.netdev_max_backlog net.core.netdev_max_backlog = 1000 Ethernet txqueuelen (1000)
net.core.netdev_max_backlog is the "maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them." https://www.kernel.org/doc/html/latest/admin-guide/sysctl/net.html#netdev-ma... if we were having trouble with backlog buffer sizes, I would expect to see lots of dropped packets, and I don't:
# ethtool -S eno1 | grep dropped rx_dropped: 0 tx_dropped: 0
Yes, the lack of drops makes me think we should look elsewhere.
It may be something inside snowflake-server, for example some central scheduling algorithm that cannot run any faster. (Though if that were the case, I'd expect to see one CPU core at 100%, which I do not.) I suggest doing another round of profiling now that we have taken care of the more obvious hotspots in https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
After an interesting chat with anarcat I think that we are CPU bound and in particular by handling so many interrupts from the NIC and dealing with such a high number of context switches. I have two suggestions on how to move forward with this.
First, let's patch tor to get rid of the extor processes, as suggested by David earlier when discussing RAM pressure. This should bring down context switches.
Second, once we've got #40186 sorted, do what's suggested in [1] to bring the interrupt frequency down. This should take some load off the CPU's.
[1] https://www.kernel.org/doc/html/v4.20/networking/i40e.html#interrupt-rate-li...
On Tue, Sep 27, 2022 at 08:22:21PM +0200, Linus Nordberg wrote:
David Fifield david@bamsoftware.com wrote Tue, 27 Sep 2022 08:54:53 -0600:
I checked the number of sockets connected to the haproxy frontend port, thinking that we may be running out of localhost 4-tuples. It's still in bounds (but we may have to figure something out for that eventually).
# ss -n | grep -c '127.0.0.1:10000\s*$' 27314 # sysctl net.ipv4.ip_local_port_range net.ipv4.ip_local_port_range = 15000 64000
Would more IP addresses and DNS round robin work?
By more IP addresses you mean more localhost IP addresses, I guess? All of 127.0.0.0/8 is localhost, so we can expand the range of four-tuples by using more addresses from that address range in either the source or destination address position. haproxy probably has an option to listen on multiple addresses. The trick is actually using the multiple addresses. I don't think DNS will work directly, because snowflake-server gets the address of its upstream from the TOR_PT_ORPORT environment variable, which is specified to take an IP:port, not a DNS name (and is implemented that way in goptlib). https://gitweb.torproject.org/torspec.git/tree/pt-spec.txt?id=ec77ae643f3e47... https://gitweb.torproject.org/pluggable-transports/goptlib.git/tree/pt.go?h=... You could try using more addresses from 127.0.0.0/8 in the *source* address position, by specifying the second parameter of net.DialTCP to set the source address here: https://gitweb.torproject.org/pluggable-transports/goptlib.git/tree/pt.go?h=...
It may be something inside snowflake-server, for example some central scheduling algorithm that cannot run any faster. (Though if that were the case, I'd expect to see one CPU core at 100%, which I do not.) I suggest doing another round of profiling now that we have taken care of the more obvious hotspots in https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
After an interesting chat with anarcat I think that we are CPU bound and in particular by handling so many interrupts from the NIC and dealing with such a high number of context switches. I have two suggestions on how to move forward with this.
First, let's patch tor to get rid of the extor processes, as suggested by David earlier when discussing RAM pressure. This should bring down context switches.
The easiest way to do this is probably to comment out the re-randomization of the ExtORPort auth cookie file on startup, and replace the existing cookie files with static files. Or even just comment out the failure case in connection_ext_or_auth_handle_client_hash. https://gitweb.torproject.org/tor.git/tree/src/feature/relay/ext_orport.c?h=...
The uncontrollable rerandomization of auth cookies is the whole reason for extor-static-cookie: https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-si...
Here's my post requesting support in core tor: https://lists.torproject.org/pipermail/tor-dev/2022-February/014695.html
David Fifield david@bamsoftware.com wrote Tue, 27 Sep 2022 14:40:48 -0600:
On Tue, Sep 27, 2022 at 08:22:21PM +0200, Linus Nordberg wrote:
David Fifield david@bamsoftware.com wrote Tue, 27 Sep 2022 08:54:53 -0600:
I checked the number of sockets connected to the haproxy frontend port, thinking that we may be running out of localhost 4-tuples. It's still in bounds (but we may have to figure something out for that eventually).
# ss -n | grep -c '127.0.0.1:10000\s*$' 27314 # sysctl net.ipv4.ip_local_port_range net.ipv4.ip_local_port_range = 15000 64000
Would more IP addresses and DNS round robin work?
By more IP addresses you mean more localhost IP addresses, I guess?
My confusion was strong at that time yesterday. I mixed up 4-tuples on our (only) externally reachable address with 4-tuples on localhost addresses. Please ignore and thanks for clarifying.
Getting rid of extor should lower the need for localhost 4-tuples, shouldn't it?
On Wed, Sep 28, 2022 at 11:31:05AM +0200, Linus Nordberg wrote:
David Fifield david@bamsoftware.com wrote Tue, 27 Sep 2022 14:40:48 -0600:
On Tue, Sep 27, 2022 at 08:22:21PM +0200, Linus Nordberg wrote:
David Fifield david@bamsoftware.com wrote Tue, 27 Sep 2022 08:54:53 -0600:
I checked the number of sockets connected to the haproxy frontend port, thinking that we may be running out of localhost 4-tuples. It's still in bounds (but we may have to figure something out for that eventually).
# ss -n | grep -c '127.0.0.1:10000\s*$' 27314 # sysctl net.ipv4.ip_local_port_range net.ipv4.ip_local_port_range = 15000 64000
Would more IP addresses and DNS round robin work?
By more IP addresses you mean more localhost IP addresses, I guess?
My confusion was strong at that time yesterday. I mixed up 4-tuples on our (only) externally reachable address with 4-tuples on localhost addresses. Please ignore and thanks for clarifying.
Getting rid of extor should lower the need for localhost 4-tuples, shouldn't it?
No, not really. The problem is not the total number of 127.0.0.1 four-tuples in use — there are ≈2^32 of those — it's when one end has a fixed port number. The bottleneck in this case is the link between snowflake-server and haproxy (see diagram): https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guid...
haproxy binds to 127.0.0.1:10000 and snowflake-proxy connects to haproxy from 127.0.0.1 and an ephemeral port, so three of the four elements of the four-tuple are fixed, permitting only ≈2^16 different tuples:
(127.0.0.1, X, 127.0.0.1, 10000)
The whole pluggable transports interface is built around this model of localhost TCP sockets; I think it did not anticipate scale like this. snowflake-server gets the address 127.0.0.1:10000 from an environment variable; see in /etc/systemd/system/snowflake-server.service:
Environment=TOR_PT_EXTENDED_SERVER_PORT=127.0.0.1:10000
When snowflake-server does pt.DialOr, it's the above address that it makes a TCP connection to. https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
snowflake-server *thinks* it is talking to an upstream tor process's ExtORPort at that address, when actually the connection is intermediated by haproxy (because a single tor process can only handle a limited amount of traffic) and extor-static-cookie (because each tor instance uses a different random authentication key).
haproxy, of course, can listen on multiple ports on its frontend, but TOR_PT_EXTENDED_SERVER_PORT is specified to contain only a single address: https://gitweb.torproject.org/torspec.git/tree/pt-spec.txt?id=ec77ae643f3e47...
That said, none of the above prevents us from hacking around the pluggable transports model where it is constraining. We can free up four-tuple space by varying any of the four elements in the example above; or by using something other than TCP sockets for one or more localhost links. For example, we could hack pt.DialOr to use a random source address in the 127.0.0.0/8 range; that would give us an additional factor of 2^24 between snowflake-server and haproxy. Or we could replace that link with a Unix domain socket. It would just require an alternative means of passing the socket address into snowflake-server, because TOR_PT_EXTENDED_SERVER_PORT cannot represent such an address, and a different version of the pt.DialOr function that does not have the assumption of TCP baked in. https://pkg.go.dev/git.torproject.org/pluggable-transports/goptlib.git#DialO... https://gitweb.torproject.org/pluggable-transports/goptlib.git/tree/pt.go?h=...
Removing extor-static-cookie from the chain would not have an effect on the need for four-tuples, since each of them uses a distinct port number and only has 1/12 of the connections of the bottleneck link.
On Wed, Sep 28, 2022 at 09:40:37AM -0600, David Fifield wrote:
No, not really. The problem is not the total number of 127.0.0.1 four-tuples in use — there are ≈2^32 of those — it's when one end has a fixed port number. The bottleneck in this case is the link between snowflake-server and haproxy (see diagram): https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guid...
My analysis here was incomplete. It is true that when counting distinct four-tuples the total number of sockets does not really matter. But there's another constraint to consider, which is the limited number of ephemeral ports to use in source addresses in localhost connections. We have actually been running into into this problem the past 2 days ("cannot assign requested address"):
https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowfla...
I'm planning to mitigate it by having localhost communication use different IP addresses (e.g. 127.0.0.2) as source addresses when possible.
On 2022-09-22 11:24, David Fifield wrote:
There is increased usage of the snowflake-01 bridge since yesterday, likely related to protests/shutdowns in Iran. The 4 load-balanced tor instances, which recently were at about 60% CPU at the steady state, are currently near 100%.
I am planning to increase the number of instances today.
It looks like we're also reaching proxy capacity again for the first time in a while.
I've attached a visualization of available proxies that are compatible with all types of client NATs. You can see in the first image that the number of idle proxies has gone to zero and all available proxies are being matched. The second image shows spikes in the number of clients denied a working proxy.
The depletion of this proxy pool could be due to the high amount of mobile network usage, since these networks are likely to have complex and restrictive NAT topologies.
On Fri, Sep 23, 2022 at 11:01:37AM -0400, Cecylia Bocovich wrote:
On 2022-09-22 11:24, David Fifield wrote:
There is increased usage of the snowflake-01 bridge since yesterday, likely related to protests/shutdowns in Iran. The 4 load-balanced tor instances, which recently were at about 60% CPU at the steady state, are currently near 100%.
I am planning to increase the number of instances today.
It looks like we're also reaching proxy capacity again for the first time in a while.
I've attached a visualization of available proxies that are compatible with all types of client NATs. You can see in the first image that the number of idle proxies has gone to zero and all available proxies are being matched. The second image shows spikes in the number of clients denied a working proxy.
The depletion of this proxy pool could be due to the high amount of mobile network usage, since these networks are likely to have complex and restrictive NAT topologies.
Besides recruiting more proxies, could we stretch the existing unrestricted proxies further? When a proxy finds its own NAT type to be unrestricted, it could increase its polling frequency and/or concurrent capacity.
On 2022-09-27 09:49, David Fifield wrote:
On Fri, Sep 23, 2022 at 11:01:37AM -0400, Cecylia Bocovich wrote:
On 2022-09-22 11:24, David Fifield wrote:
There is increased usage of the snowflake-01 bridge since yesterday, likely related to protests/shutdowns in Iran. The 4 load-balanced tor instances, which recently were at about 60% CPU at the steady state, are currently near 100%.
I am planning to increase the number of instances today.
It looks like we're also reaching proxy capacity again for the first time in a while.
I've attached a visualization of available proxies that are compatible with all types of client NATs. You can see in the first image that the number of idle proxies has gone to zero and all available proxies are being matched. The second image shows spikes in the number of clients denied a working proxy.
The depletion of this proxy pool could be due to the high amount of mobile network usage, since these networks are likely to have complex and restrictive NAT topologies.
Besides recruiting more proxies, could we stretch the existing unrestricted proxies further? When a proxy finds its own NAT type to be unrestricted, it could increase its polling frequency and/or concurrent capacity.
That's a good idea.
I've started working on tackling the problem from the other side[0]: we have a lot of clients who may not need unrestricted proxies pulling from that pool because their NAT type is unknown (see attached image).
If we have these clients optimistically pull from the other pool, we could reduce the load substantially. However, this change will take a while to roll out because it has to be included in a Tor Browser release.
Updating proxies to poll more frequently is easier to roll out quickly.
[0] https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
On 2022-09-27 10:56, Cecylia Bocovich wrote:
On 2022-09-27 09:49, David Fifield wrote:
On Fri, Sep 23, 2022 at 11:01:37AM -0400, Cecylia Bocovich wrote:
On 2022-09-22 11:24, David Fifield wrote:
There is increased usage of the snowflake-01 bridge since yesterday, likely related to protests/shutdowns in Iran. The 4 load-balanced tor instances, which recently were at about 60% CPU at the steady state, are currently near 100%.
I am planning to increase the number of instances today.
It looks like we're also reaching proxy capacity again for the first time in a while.
I've attached a visualization of available proxies that are compatible with all types of client NATs. You can see in the first image that the number of idle proxies has gone to zero and all available proxies are being matched. The second image shows spikes in the number of clients denied a working proxy.
The depletion of this proxy pool could be due to the high amount of mobile network usage, since these networks are likely to have complex and restrictive NAT topologies.
Besides recruiting more proxies, could we stretch the existing unrestricted proxies further? When a proxy finds its own NAT type to be unrestricted, it could increase its polling frequency and/or concurrent capacity.
That's a good idea.
I've started working on tackling the problem from the other side[0]: we have a lot of clients who may not need unrestricted proxies pulling from that pool because their NAT type is unknown (see attached image).
If we have these clients optimistically pull from the other pool, we could reduce the load substantially. However, this change will take a while to roll out because it has to be included in a Tor Browser release.
Updating proxies to poll more frequently is easier to roll out quickly.
[0] https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
Oops, attaching image.
On Tue, Sep 27, 2022 at 10:56:57AM -0400, Cecylia wrote:
On 2022-09-27 10:56, Cecylia Bocovich wrote:
On 2022-09-27 09:49, David Fifield wrote:
Besides recruiting more proxies, could we stretch the existing unrestricted proxies further? When a proxy finds its own NAT type to be unrestricted, it could increase its polling frequency and/or concurrent capacity.
That's a good idea.
I've started working on tackling the problem from the other side[0]: we have a lot of clients who may not need unrestricted proxies pulling from that pool because their NAT type is unknown (see attached image).
If we have these clients optimistically pull from the other pool, we could reduce the load substantially. However, this change will take a while to roll out because it has to be included in a Tor Browser release.
Updating proxies to poll more frequently is easier to roll out quickly.
[0] https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
Oops, attaching image.
Could probetest being sick also contribute to a lack of the right NAT types? It's at 100% CPU again, just now as I check it.
/etc/runit/snowflake-probetest/run currently has "timeout 7d"; what if we restarted it every hour?
On 2022-09-28 16:14, David Fifield wrote:
On Tue, Sep 27, 2022 at 10:56:57AM -0400, Cecylia wrote:
On 2022-09-27 10:56, Cecylia Bocovich wrote:
On 2022-09-27 09:49, David Fifield wrote:
Besides recruiting more proxies, could we stretch the existing unrestricted proxies further? When a proxy finds its own NAT type to be unrestricted, it could increase its polling frequency and/or concurrent capacity.
That's a good idea.
I've started working on tackling the problem from the other side[0]: we have a lot of clients who may not need unrestricted proxies pulling from that pool because their NAT type is unknown (see attached image).
If we have these clients optimistically pull from the other pool, we could reduce the load substantially. However, this change will take a while to roll out because it has to be included in a Tor Browser release.
Updating proxies to poll more frequently is easier to roll out quickly.
[0] https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
Oops, attaching image.
Could probetest being sick also contribute to a lack of the right NAT types? It's at 100% CPU again, just now as I check it.
/etc/runit/snowflake-probetest/run currently has "timeout 7d"; what if we restarted it every hour?
Good idea. I just set it arbitrarily to 4 hours. Right now proxies re-attempt to discover their NAT type every 24 hours, we could shorten this interval for proxies with unknown NAT types.
On 2022-09-29 11:05, Cecylia Bocovich wrote:
Besides recruiting more proxies, could we stretch the existing unrestricted proxies further? When a proxy finds its own NAT type to be unrestricted, it could increase its polling frequency and/or concurrent capacity.
I've opened a merge request that will both increase the polling frequency and bump up the max number of clients from 1 to 2 for unrestricted proxies. This change happens once the proxies have successfully opened a datachannel with a client.
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowf...
Proxies that work well with clients in their pool already poll pretty frequently (once every 60s), so I don't think increasing this to 30 seconds will make much of a difference, but doubling the capacity for unrestricted web-based proxies should. I'm hesitant to increase the maximum number of clients further without more testing.
anti-censorship-team@lists.torproject.org