INDEX
Explanations
names of political figures and entities
references to specific individuals or characters, particularly those with initials or single-letter identifiers
New Auto-Interp
Negative Logits
SOURCE
-0.73
Trust
-0.66
Beacon
-0.65
ablishment
-0.65
Atlantic
-0.65
FontSize
-0.64
Mercury
-0.63
Niagara
-0.63
Connie
-0.62
Times
-0.62
POSITIVE LOGITS
oub
1.02
ollah
0.88
abis
0.88
utsch
0.86
ĪĴ
0.84
inav
0.84
hai
0.80
akh
0.80
ymes
0.80
agh
0.80
Activations Density 0.142%