INDEX
Explanations
words related to organization and structure
New Auto-Interp
Negative Logits
footed
-0.66
ãĥŃ
-0.62
Rush
-0.60
ãĥ¯
-0.59
mercial
-0.59
Anglo
-0.58
umbledore
-0.57
asive
-0.57
Spoon
-0.56
tumblr
-0.56
POSITIVE LOGITS
roth
1.07
opol
0.72
lene
0.70
unsus
0.69
owa
0.68
ovsky
0.67
lde
0.66
ovic
0.65
mber
0.65
bor
0.65
Activations Density 0.042%