INDEX
Explanations
references to lists or categories
New Auto-Interp
Head Attr Weights
0:0.04
1:0.02
2:0.09
3:0.07
4:0.08
5:0.03
6:0.12
7:0.20
8:0.03
9:0.04
10:0.15
11:0.06
Negative Logits
ilet
-1.73
otine
-1.69
edin
-1.68
unte
-1.67
ombat
-1.60
ews
-1.56
chat
-1.55
Quit
-1.54
isec
-1.52
osh
-1.51
POSITIVE LOGITS
Glac
1.61
1954
1.59
1946
1.57
Lands
1.54
1953
1.53
1935
1.53
eatures
1.50
1952
1.47
lust
1.45
Franz
1.45
Activations Density 0.000%