INDEX
Explanations
words related to political figures or events
names and references to specific individuals or characters, particularly those in political or entertainment contexts
New Auto-Interp
Negative Logits
ãĤ¦
-0.74
lished
-0.64
ORTS
-0.62
mete
-0.57
Dino
-0.57
âĢ¢âĢ¢
-0.55
ãĥĨ
-0.55
feces
-0.54
yuan
-0.54
Rated
-0.53
POSITIVE LOGITS
issan
0.90
mort
0.88
xton
0.84
ttle
0.82
pson
0.78
eus
0.74
essen
0.74
metic
0.73
teenth
0.73
ault
0.70
Activations Density 0.069%