INDEX
Explanations
mentions of taking breaks
New Auto-Interp
Negative Logits
amm
-0.16
engo
-0.15
adt
-0.15
irth
-0.15
è£ı
-0.14
ités
-0.14
Wik
-0.14
równ
-0.13
ratt
-0.13
xaa
-0.13
POSITIVE LOGITS
orative
0.16
unnel
0.16
$MESS
0.15
á»ij
0.15
ÏĩÏģÏĮ
0.15
unkt
0.15
osaur
0.14
icle
0.14
Airways
0.14
éru
0.14
Activations Density 0.014%