INDEX
Explanations
the concept of friendship
New Auto-Interp
Head Attr Weights
0:0.08
1:0.08
2:0.07
3:0.08
4:0.07
5:0.07
6:0.07
7:0.09
8:0.07
9:0.08
10:0.09
11:0.09
Negative Logits
fem
-2.93
sing
-2.84
menstru
-2.75
sexual
-2.64
noun
-2.61
femin
-2.57
vowel
-2.54
maternal
-2.50
Saiyan
-2.50
pronouns
-2.46
POSITIVE LOGITS
Marlins
2.96
Moz
2.87
kefeller
2.79
Oman
2.62
Zot
2.61
Moroc
2.53
Anon
2.50
Phill
2.48
Tripoli
2.48
untled
2.47
Activations Density 0.000%