INDEX
Explanations
locations or proper nouns associated with specific places
proper nouns, particularly names of places and organizations
New Auto-Interp
Negative Logits
satisfaction
-0.64
unfavorable
-0.60
disposable
-0.59
behav
-0.58
craving
-0.58
prejudice
-0.57
favourable
-0.57
suspic
-0.57
overwhelming
-0.57
favorable
-0.56
POSITIVE LOGITS
ornings
0.73
lore
0.68
itars
0.66
anky
0.64
esan
0.63
agent
0.63
jon
0.63
itus
0.61
wyn
0.60
selage
0.59
Activations Density 0.859%