INDEX
Explanations
concepts related to worldview or perspectives
New Auto-Interp
Head Attr Weights
0:0.08
1:0.09
2:0.11
3:0.08
4:0.08
5:0.07
6:0.07
7:0.07
8:0.07
9:0.07
10:0.07
11:0.09
Negative Logits
ワン
-3.11
龍喚士
-2.60
ヘラ
-2.54
裏�
-2.47
Dust
-2.43
ァ
-2.29
д
-2.27
":["
-2.27
�
-2.25
ノ
-2.12
POSITIVE LOGITS
2.33
Argon
2.21
Mono
2.03
Ontario
2.02
NYT
2.02
apex
2.02
MT
1.96
phosphate
1.95
AE
1.90
prolifer
1.86
Activations Density 0.000%