INDEX
Explanations
mentions of fictional works and their relationship to reality or true stories
New Auto-Interp
Negative Logits
ickets
-0.15
anela
-0.15
Jury
-0.15
zbollah
-0.14
antly
-0.14
letcher
-0.14
wet
-0.14
İY
-0.14
оÑģоб
-0.14
gut
-0.13
POSITIVE LOGITS
bens
0.16
vere
0.16
appy
0.15
eler
0.14
urge
0.14
datatype
0.14
inta
0.14
dra
0.14
Zug
0.14
MRI
0.13
Activations Density 0.180%