INDEX
Explanations
words related to names and specific terms, possibly related to political figures or events
tokens related to names or identifiers, particularly focused on a specific character or entity throughout the text
New Auto-Interp
Negative Logits
ahime
-0.72
kamp
-0.61
nw
-0.60
holiest
-0.56
Franch
-0.56
tresp
-0.54
jong
-0.54
developing
-0.51
ung
-0.51
eton
-0.51
POSITIVE LOGITS
dden
0.68
avis
0.67
ody
0.66
Marginal
0.64
Pick
0.62
atche
0.62
icz
0.62
ĵĺ
0.61
iversal
0.60
anyl
0.60
Activations Density 0.532%