INDEX
Explanations
date and time information
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.13
3:0.13
4:0.11
5:0.03
6:0.05
7:0.12
8:0.04
9:0.06
10:0.07
11:0.16
Negative Logits
Leaks
-1.33
milo
-1.25
Kardash
-1.23
Labrador
-1.22
encourages
-1.18
hides
-1.18
refers
-1.17
hump
-1.16
han
-1.15
graph
-1.14
POSITIVE LOGITS
��極
1.53
omore
1.49
eers
1.48
pole
1.46
bj
1.41
atana
1.37
�士
1.34
endo
1.33
��
1.32
Ips
1.28
Activations Density 0.001%