INDEX
Explanations
references to East Asian culture or businesses
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.10
3:0.08
4:0.06
5:0.02
6:0.12
7:0.06
8:0.05
9:0.05
10:0.29
11:0.10
Negative Logits
manoeuv
-1.56
glide
-1.54
oufl
-1.52
wich
-1.42
oglu
-1.42
consec
-1.41
disgu
-1.41
aders
-1.40
udeau
-1.40
stad
-1.39
POSITIVE LOGITS
��
1.82
Meta
1.67
Submit
1.57
forum
1.50
gements
1.48
Editorial
1.47
editorial
1.46
videos
1.46
internet
1.46
Upload
1.45
Activations Density 0.000%