INDEX
Explanations
names of people or places
proper nouns, particularly names
New Auto-Interp
Negative Logits
bound
-0.73
bre
-0.67
breaking
-0.61
position
-0.60
ivated
-0.60
hillary
-0.57
stri
-0.57
breakers
-0.56
imm
-0.56
con
-0.56
POSITIVE LOGITS
's
0.70
herself
0.69
Sr
0.67
commented
0.64
stals
0.64
ites
0.63
iev
0.62
inen
0.61
enegger
0.61
KE
0.61
Activations Density 0.290%