INDEX
Explanations
proper nouns or specific names
occurrences of the word "called" in various contexts
New Auto-Interp
Negative Logits
edia
-0.84
olitics
-0.75
bilt
-0.71
inth
-0.66
Leaks
-0.65
feat
-0.63
isode
-0.63
SPONSORED
-0.62
isman
-0.62
_-
-0.61
POSITIVE LOGITS
@#&
0.68
'
0.66
forth
0.65
upon
0.62
`
0.61
phas
0.61
attention
0.60
"
0.60
''
0.59
disorderly
0.59
Activations Density 0.063%