INDEX
Explanations
mentions of people's names ending with "mon"
New Auto-Interp
Negative Logits
Norn
-0.70
ACTED
-0.66
Paste
-0.62
ENG
-0.61
OHN
-0.60
î
-0.60
NRS
-0.59
EMP
-0.59
tomat
-0.59
RED
-0.59
POSITIVE LOGITS
itored
1.17
etary
1.10
strous
1.06
ial
1.05
uclear
1.01
iac
0.99
ials
0.96
astery
0.90
ter
0.87
ious
0.86
Activations Density 0.021%