INDEX
Explanations
names of people, especially those involved in political or public affairs
names, particularly those of notable individuals
New Auto-Interp
Head Attr Weights
0:0.08
1:0.03
2:0.18
3:0.07
4:0.16
5:0.05
6:0.03
7:0.04
8:0.05
9:0.17
10:0.06
11:0.03
Negative Logits
acea
-1.37
independents
-1.24
idays
-1.22
ModLoader
-1.21
ゴ
-1.19
Topics
-1.11
anical
-1.11
Tang
-1.10
Reviewer
-1.10
CLE
-1.09
POSITIVE LOGITS
ğ
1.48
uty
1.32
iste
1.29
uve
1.27
igham
1.27
Ré
1.26
ault
1.24
uten
1.22
chal
1.22
Fey
1.18
Activations Density 0.004%