INDEX
Explanations
names of locations and people
proper nouns and terms related to specific people and places
New Auto-Interp
Negative Logits
rooting
-0.76
metic
-0.75
stump
-0.69
craving
-0.64
unsett
-0.63
departing
-0.63
roundup
-0.63
heav
-0.62
mildly
-0.62
ermott
-0.62
POSITIVE LOGITS
gard
1.22
ner
1.03
gart
0.97
ners
0.92
alist
0.91
ente
0.89
ility
0.88
lasses
0.88
pipe
0.88
ens
0.87
Activations Density 0.025%