INDEX
Explanations
words that indicate community engagement and interaction
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.06
3:0.05
4:0.05
5:0.03
6:0.47
7:0.06
8:0.03
9:0.05
10:0.06
11:0.05
Negative Logits
phia
-1.68
VICE
-1.29
fault
-1.27
ा
-1.23
HAEL
-1.21
PsyNetMessage
-1.19
theless
-1.16
969
-1.16
itness
-1.14
chambers
-1.13
POSITIVE LOGITS
ikuman
1.63
ć
1.61
ovi
1.59
cano
1.48
神
1.45
ibaba
1.45
č
1.44
ago
1.44
soon
1.41
』
1.38
Activations Density 0.005%