INDEX
Explanations
names of people involved in various situations or events
proper nouns, particularly names of individuals
New Auto-Interp
Negative Logits
ãĤ¡
-0.80
Americ
-0.67
CRE
-0.66
tml
-0.60
SAM
-0.57
Dresden
-0.56
Kingdoms
-0.55
soType
-0.55
category
-0.55
ãĥ¬
-0.54
POSITIVE LOGITS
himself
0.97
vetoed
0.95
reportedly
0.95
testified
0.94
apologized
0.93
responded
0.90
allegedly
0.90
replied
0.88
denies
0.87
countered
0.87
Activations Density 0.401%