INDEX
Explanations
phrases related to prioritization of personal interests over collective interests, possibly warning against it
instances of the word "respond."
New Auto-Interp
Negative Logits
Syri
-0.67
Hels
-0.63
Fas
-0.63
Ü
-0.62
lain
-0.62
ernaut
-0.61
maid
-0.61
dies
-0.60
alsh
-0.59
ieth
-0.59
POSITIVE LOGITS
=#
0.83
rant
0.79
anchester
0.67
peria
0.66
Publication
0.66
teasing
0.66
otti
0.66
Ħ¢
0.65
Magikarp
0.64
âĶľ
0.63
Activations Density 0.000%