INDEX
Explanations
the infinitive form of verbs, particularly "to" followed by another verb
New Auto-Interp
Negative Logits
anca
-0.17
iliz
-0.16
fewer
-0.15
@student
-0.15
uc
-0.15
abei
-0.14
ìķħ
-0.14
chances
-0.14
antly
-0.14
patches
-0.14
POSITIVE LOGITS
sap
0.16
s
0.14
oping
0.14
Reuse
0.14
MODE
0.14
Rough
0.14
best
0.13
ÃŃž
0.13
ops
0.13
eyh
0.13
Activations Density 0.039%