INDEX
Explanations
specific identifiers and references related to historical events or figures
New Auto-Interp
Negative Logits
’↵↵
-0.17
‘
-0.16
[â̦]
-0.16
vids
-0.16
—↵↵
-0.16
âĨĴ↵↵
-0.15
’ll
-0.15
â̦
-0.14
!’
-0.14
–
-0.14
POSITIVE LOGITS
unidentified
0.28
possibly
0.23
verso
0.20
probably
0.20
probable
0.19
likely
0.19
identified
0.19
automobiles
0.19
""
0.18
Possibly
0.18
Activations Density 0.037%