INDEX
Explanations
proper names, specifically focusing on names of places and people
mentions of specific names or entities
New Auto-Interp
Negative Logits
lda
-0.84
============
-0.80
HER
-0.78
OF
-0.75
glers
-0.75
========
-0.75
HL
-0.73
BLIC
-0.72
ãĤ´ãĥ³
-0.71
selage
-0.71
POSITIVE LOGITS
shire
0.94
axter
0.82
ellen
0.80
Kut
0.78
enton
0.77
icum
0.77
skill
0.76
s
0.76
roach
0.76
owed
0.74
Activations Density 0.027%