INDEX
Explanations
words related to geographical locations or proper nouns
references to specific locations and their host cities
New Auto-Interp
Negative Logits
rd
-0.80
Reviewer
-0.70
[+
-0.69
Situation
-0.65
NRS
-0.63
requ
-0.62
ofi
-0.62
football
-0.61
ERAL
-0.60
ools
-0.58
POSITIVE LOGITS
auga
1.64
Mississ
1.37
gow
0.89
forth
0.77
Osh
0.75
atoon
0.75
lehem
0.70
Bram
0.70
adish
0.69
aret
0.69
Activations Density 0.021%