INDEX
Explanations
names of individuals and their involvement in various contexts
New Auto-Interp
Negative Logits
ãĤ¡
-0.68
CRE
-0.67
Americ
-0.63
ãĥ¬
-0.62
0200
-0.60
ween
-0.59
..."
-0.58
1500
-0.57
SAM
-0.57
2500
-0.56
POSITIVE LOGITS
reportedly
1.12
meanwhile
0.95
denies
0.95
responded
0.95
apologized
0.94
famously
0.94
allegedly
0.91
testified
0.91
sacked
0.90
vetoed
0.88
Activations Density 0.187%