INDEX
Explanations
words that indicate a significant degree of impact or influence
New Auto-Interp
Negative Logits
large
-0.16
омен
-0.15
-sized
-0.14
-large
-0.14
Famous
-0.14
enger
-0.14
Nä
-0.14
strong
-0.14
blindness
-0.14
Meng
-0.14
POSITIVE LOGITS
asca
0.18
outnumber
0.16
denn
0.15
.masks
0.15
μι
0.14
lac
0.14
unsch
0.14
udo
0.14
bes
0.14
.vs
0.14
Activations Density 0.060%