INDEX
Explanations
phrases related to someone wanting or requesting something from others
references to desires or requests made by individuals
New Auto-Interp
Negative Logits
ibe
-0.71
ggles
-0.67
VIDIA
-0.66
requires
-0.62
srfAttach
-0.61
berra
-0.60
guyen
-0.58
gression
-0.56
pite
-0.55
ipop
-0.53
POSITIVE LOGITS
to
1.08
deported
0.95
gone
0.90
punished
0.88
cleaned
0.81
removed
0.79
to
0.79
prosecuted
0.79
silenced
0.78
eliminated
0.77
Activations Density 0.131%