INDEX
Explanations
proper nouns, likely related to people, places, or organizations
mentions of specific people or characters
New Auto-Interp
Negative Logits
ASA
-0.88
Hots
-0.78
acron
-0.74
impulse
-0.73
ãĥ³ãĤ¸
-0.71
tram
-0.70
Empires
-0.70
Astro
-0.69
paras
-0.68
turkey
-0.68
POSITIVE LOGITS
ll
1.75
LL
1.49
oll
1.29
ill
1.26
ELL
1.14
lla
1.13
ell
1.10
lled
1.10
ills
1.10
llo
1.10
Activations Density 0.239%