INDEX
Explanations
greetings and instructions for action
requests or commands for action
New Auto-Interp
Negative Logits
76561
-0.76
arthed
-0.75
bent
-0.71
lings
-0.68
é¾
-0.67
byss
-0.67
Balt
-0.67
aler
-0.67
Aust
-0.66
Osw
-0.65
POSITIVE LOGITS
forgive
1.22
enable
1.17
note
1.16
refrain
1.14
excuse
1.13
pardon
1.07
consider
1.06
refer
1.05
don
1.04
remember
1.00
Activations Density 0.046%