INDEX
Explanations
words related to political figures or locations
words related to specific names or nouns
New Auto-Interp
Negative Logits
schild
-0.67
recomm
-0.64
CLASS
-0.62
ovych
-0.61
caution
-0.59
lamb
-0.58
fury
-0.57
shine
-0.57
upset
-0.56
goodwill
-0.55
POSITIVE LOGITS
ulhu
0.95
pillar
0.93
estine
0.86
arette
0.85
rera
0.85
ãĤ¼ãĤ¦ãĤ¹
0.83
berus
0.83
illo
0.80
inelli
0.80
emporary
0.76
Activations Density 0.106%