INDEX
Explanations
words indicating economic responsibility or implications
New Auto-Interp
Negative Logits
çĶ
-0.15
ego
-0.14
lak
-0.14
Bow
-0.14
Goldberg
-0.13
/us
-0.13
सन
-0.13
ullan
-0.13
ein
-0.13
alk
-0.13
POSITIVE LOGITS
arty
0.16
ilon
0.15
Pound
0.15
ULA
0.15
angler
0.14
765
0.14
Marty
0.14
idad
0.14
-pattern
0.14
kor
0.14
Activations Density 0.465%