INDEX
Explanations
words related to comments, explanations, and interactions
New Auto-Interp
Negative Logits
selves
-0.64
common
-0.64
hub
-0.63
aura
-0.60
unison
-0.59
Composite
-0.58
emale
-0.56
ogether
-0.54
avia
-0.54
collective
-0.53
POSITIVE LOGITS
himself
1.17
Himself
0.80
thence
0.65
lect
0.64
personally
0.64
his
0.63
solo
0.62
imaru
0.60
remorse
0.60
resign
0.60
Activations Density 0.487%