INDEX
Explanations
terms related to ableism
New Auto-Interp
Head Attr Weights
0:0.07
1:0.07
2:0.07
3:0.09
4:0.09
5:0.07
6:0.08
7:0.09
8:0.09
9:0.06
10:0.08
11:0.08
Negative Logits
asar
-1.75
icably
-1.62
bek
-1.54
iannopoulos
-1.47
hov
-1.39
uther
-1.39
onymous
-1.39
uchin
-1.39
oulos
-1.37
arnaev
-1.37
POSITIVE LOGITS
contribution
1.69
clipboard
1.57
Conquer
1.44
��
1.42
parity
1.42
Solitaire
1.42
simulac
1.39
levels
1.38
�
1.37
niche
1.31
Activations Density 0.000%