INDEX
Explanations
mentions of specific names or proper nouns
names and references to specific individuals or entities
New Auto-Interp
Negative Logits
rum
-0.84
mented
-0.81
matically
-0.80
glim
-0.77
shenan
-0.73
rers
-0.71
ging
-0.71
fully
-0.71
pse
-0.70
ged
-0.70
POSITIVE LOGITS
tera
0.92
terday
0.73
alon
0.72
xual
0.72
Centauri
0.71
bian
0.70
oulos
0.68
odus
0.67
ylon
0.67
uclear
0.65
Activations Density 0.028%