INDEX
Explanations
verb phrases indicating changes or conditions regarding availability or accessibility
New Auto-Interp
Negative Logits
ials
-0.18
oth
-0.15
azen
-0.15
532
-0.15
YTE
-0.14
лада
-0.14
gag
-0.14
oller
-0.14
iac
-0.14
OTH
-0.14
POSITIVE LOGITS
chas
0.25
traction
0.24
dice
0.20
underway
0.20
attention
0.18
away
0.17
dice
0.17
cha
0.17
away
0.16
Dice
0.16
Activations Density 0.055%