INDEX
Explanations
phrases related to social relationships and interactions, particularly in the context of language and communication
New Auto-Interp
Head Attr Weights
0:0.09
1:0.02
2:0.10
3:0.11
4:0.07
5:0.03
6:0.07
7:0.03
8:0.05
9:0.08
10:0.07
11:0.22
Negative Logits
cms
-1.57
ーティ
-1.56
DragonMagazine
-1.55
lar
-1.54
Boise
-1.50
clus
-1.49
fest
-1.46
Ult
-1.44
baugh
-1.42
beck
-1.41
POSITIVE LOGITS
vidia
1.70
olar
1.63
":[{"1.57
hiba
1.57
>[
1.51
agos
1.47
"},
1.46
immune
1.46
"],
1.45
imus
1.43
Activations Density 0.021%