INDEX
Explanations
proper nouns, specifically likely names of individuals
references to a specific rank or title, particularly "Maj" (Major)
New Auto-Interp
Negative Logits
ATED
-0.70
OTH
-0.62
é¾į
-0.59
Californ
-0.59
OVER
-0.57
heating
-0.57
Charges
-0.56
tipping
-0.56
Riot
-0.55
INESS
-0.55
POSITIVE LOGITS
estic
1.35
ewski
1.13
esty
1.00
lis
0.93
adier
0.91
itte
0.90
dal
0.89
zinski
0.85
ouri
0.83
undai
0.82
Activations Density 0.034%