INDEX
Explanations
instances of the word "anything" and related phrases indicative of uncertainty or lack of specificity
New Auto-Interp
Negative Logits
ÑĢÑı
-0.17
032
-0.16
inya
-0.15
Spir
-0.15
yal
-0.15
034
-0.14
habit
-0.14
kili
-0.14
Barker
-0.14
emann
-0.14
POSITIVE LOGITS
нки
0.16
виÑī
0.15
arf
0.15
ýt
0.14
conti
0.14
ÑĢÑĥÑĤ
0.14
ooks
0.14
arc
0.14
SAL
0.14
AIN
0.14
Activations Density 0.262%