INDEX
Explanations
phrases related to community guidelines and respectful communication
New Auto-Interp
Head Attr Weights
0:0.08
1:0.11
2:0.03
3:0.11
4:0.07
5:0.14
6:0.08
7:0.06
8:0.04
9:0.13
10:0.07
11:0.03
Negative Logits
��
-2.67
acron
-2.36
Scheme
-2.34
Tile
-2.31
Agric
-2.26
Solitaire
-2.25
IRC
-2.17
Slime
-2.14
Secondly
-2.14
Templ
-2.13
POSITIVE LOGITS
bott
2.42
ils
2.39
Gab
2.35
ohan
2.33
dy
2.29
dent
2.23
Gab
2.18
Vulkan
2.18
Dol
2.17
ordan
2.16
Activations Density 0.002%