INDEX
Explanations
specific Eastern European names and cultural references
New Auto-Interp
Head Attr Weights
0:0.05
1:0.03
2:0.11
3:0.08
4:0.04
5:0.03
6:0.41
7:0.02
8:0.04
9:0.05
10:0.05
11:0.04
Negative Logits
acebook
-1.46
retty
-1.44
edIn
-1.37
ongyang
-1.34
yout
-1.33
natureconservancy
-1.32
ecause
-1.29
emouth
-1.29
zona
-1.29
ModLoader
-1.25
POSITIVE LOGITS
���
1.89
afort
1.43
amic
1.39
ilege
1.32
�
1.31
�
1.30
arching
1.24
ּ
1.24
ondo
1.21
McGee
1.19
Activations Density 0.001%