INDEX
Explanations
phrases suggesting contradictions or nuances in character or societal roles
New Auto-Interp
Negative Logits
aggi
-0.15
unfinished
-0.15
oka
-0.15
iaux
-0.14
Tro
-0.14
899
-0.14
ient
-0.14
629
-0.13
sek
-0.13
898
-0.13
POSITIVE LOGITS
immune
0.59
inv
0.49
immune
0.46
immunity
0.40
bullet
0.39
Imm
0.39
imp
0.38
-imm
0.37
imm
0.36
immun
0.34
Activations Density 0.359%