INDEX
Explanations
words related to actions of taking or grabbing
New Auto-Interp
Negative Logits
xes
-0.16
/on
-0.15
ãģĦãģĦ
-0.15
licht
-0.14
anity
-0.14
sein
-0.14
meer
-0.14
stras
-0.14
under
-0.14
dest
-0.14
POSITIVE LOGITS
aways
0.16
rypton
0.14
oot
0.14
off
0.14
advantage
0.14
/report
0.14
Flight
0.14
phoon
0.13
inversion
0.13
chal
0.13
Activations Density 0.145%