INDEX
Explanations
prepositions denoting a location or direction
occurrences of the word "in"
New Auto-Interp
Negative Logits
Cursed
-0.76
Cock
-0.69
hack
-0.64
Charlottesville
-0.62
Seah
-0.61
Debor
-0.61
Frie
-0.61
Parenthood
-0.61
Bangl
-0.60
Ethnic
-0.60
POSITIVE LOGITS
iors
0.75
ifice
0.71
aples
0.70
nery
0.69
vier
0.69
viron
0.67
BELOW
0.66
uin
0.65
cific
0.64
atos
0.64
Activations Density 0.000%