INDEX
Explanations
actions and expressions indicating agreement or acknowledgment
New Auto-Interp
Negative Logits
loroethene
-0.53
цездатний
-0.52
Lep
-0.48
ruik
-0.44
cuit
-0.43
Lep
-0.43
bans
-0.42
arj
-0.41
bino
-0.41
lü
-0.41
POSITIVE LOGITS
head
1.67
heads
1.55
Head
1.53
Head
1.47
head
1.42
HEAD
1.39
Heads
1.34
Heads
1.27
heads
1.23
HEAD
1.20
Activations Density 0.111%