INDEX
Explanations
words related to physical actions or activities involving movement
New Auto-Interp
Negative Logits
vale
-0.22
eus
-0.20
rophe
-0.20
ively
-0.19
ocks
-0.19
eza
-0.18
ome
-0.17
antly
-0.17
edException
-0.16
lide
-0.16
POSITIVE LOGITS
bing
0.57
bed
0.45
bers
0.42
ging
0.41
ming
0.40
ting
0.39
ged
0.37
ber
0.35
by
0.34
ding
0.34
Activations Density 0.111%