INDEX
Explanations
multi-syllable names that may represent diverse cultural or political references
names of people or entities, particularly those related to specific individuals or characters
New Auto-Interp
Negative Logits
MW
-0.77
LY
-0.75
MR
-0.74
LIN
-0.71
Monthly
-0.70
ML
-0.70
LAN
-0.70
writer
-0.70
itable
-0.69
lad
-0.68
POSITIVE LOGITS
gio
1.13
abba
0.91
agn
0.79
ĸļ
0.78
ascus
0.78
acca
0.75
rals
0.75
orsi
0.71
riction
0.70
uters
0.66
Activations Density 0.034%