INDEX
Explanations
people's names
mentions of specific individuals, particularly those named Curtis and Vand
New Auto-Interp
Negative Logits
ances
-0.82
orthy
-0.75
arent
-0.70
aced
-0.68
ously
-0.66
lies
-0.66
iot
-0.66
ods
-0.65
HF
-0.64
ded
-0.64
POSITIVE LOGITS
prus
0.82
selage
0.70
eer
0.70
©¶æ¥µ
0.69
plin
0.67
ãĥ¼ãĥĨãĤ£
0.67
²¾
0.66
ãĥģ
0.66
allas
0.66
anguage
0.65
Activations Density 0.074%