INDEX
Explanations
words related to locations or geographical areas
references to personal experience or identity
New Auto-Interp
Negative Logits
kefeller
-0.75
keyes
-0.74
ernels
-0.69
paralle
-0.69
Reviewed
-0.66
rama
-0.64
iosity
-0.64
olor
-0.63
rieved
-0.61
hips
-0.61
POSITIVE LOGITS
anwhile
1.30
asure
1.26
lda
1.11
zzo
1.05
ister
0.99
adows
0.99
eting
0.97
leon
0.96
isters
0.96
adow
0.95
Activations Density 0.017%