INDEX
Explanations
mentions of specific locations or addresses
articles and demonstratives referring to nouns
New Auto-Interp
Negative Logits
Vaugh
-0.71
Seym
-0.70
helicop
-0.69
ilaterally
-0.66
destro
-0.65
anamo
-0.62
Moroc
-0.59
onto
-0.59
ãĥ¼ãĥĨ
-0.58
vulner
-0.57
POSITIVE LOGITS
largeDownload
0.84
nutshell
0.76
Rock
0.58
Ĥİ
0.58
heels
0.56
Pixel
0.53
grate
0.52
Rapids
0.50
peninsula
0.50
Baltimore
0.50
Activations Density 0.211%