INDEX
Explanations
mentions or references to specific individuals
New Auto-Interp
Negative Logits
cffff
-0.74
iller
-0.71
ilon
-0.69
avorite
-0.67
whiff
-0.66
oppable
-0.62
resisted
-0.61
resistance
-0.60
é¾įå
-0.58
cape
-0.58
POSITIVE LOGITS
irect
0.86
reference
0.79
rers
0.79
rences
0.78
itatively
0.75
minist
0.73
Reference
0.73
entious
0.71
ename
0.71
ãĥĥ
0.70
Activations Density 0.765%