INDEX
Explanations
names of specific individuals, likely related to news or events
particular strings of capital letters and specific punctuation or formatting cues
New Auto-Interp
Negative Logits
Bel
-0.85
Arnold
-0.85
Lod
-0.79
Lewis
-0.79
Dwell
-0.79
Undead
-0.78
stal
-0.77
Shelley
-0.76
Lords
-0.75
в
-0.75
POSITIVE LOGITS
ap
1.54
AP
1.46
aps
1.46
APS
1.33
ip
1.32
apping
1.25
EP
1.24
mp
1.21
apped
1.17
apo
1.16
Activations Density 0.271%