INDEX
Explanations
phrases related to social behavior and interactions among people
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.21
3:0.27
4:0.05
5:0.04
6:0.06
7:0.05
8:0.04
9:0.07
10:0.08
11:0.04
Negative Logits
backing
-1.46
berus
-1.41
Schne
-1.34
Bened
-1.33
rium
-1.33
bered
-1.23
atever
-1.20
pora
-1.19
formation
-1.19
Marino
-1.19
POSITIVE LOGITS
"]=>
2.53
·
1.83
sqor
1.80
embed
1.74
avid
1.68
aden
1.66
ヘラ
1.62
david
1.60
ディ
1.55
Posted
1.51
Activations Density 0.025%