INDEX
Explanations
names of famous individuals
mentions of specific names or identifiers, particularly related to individuals and their actions
New Auto-Interp
Negative Logits
ł
-0.77
cember
-0.76
Fighter
-0.76
ĺħ
-0.75
²¾
-0.73
Pole
-0.72
awei
-0.69
ļéĨĴ
-0.69
à¼
-0.69
Pug
-0.66
POSITIVE LOGITS
sis
1.00
autions
0.89
ement
0.86
ading
0.84
ilitating
0.77
rien
0.73
aged
0.73
iliated
0.72
Trou
0.72
bles
0.72
Activations Density 0.022%