INDEX
Explanations
strongly emphasized directives and expressions of strong feelings or eventualities
New Auto-Interp
Negative Logits
imir
-0.15
ires
-0.14
led
-0.14
ire
-0.14
udos
-0.14
zig
-0.14
/her
-0.14
Higgins
-0.13
_RESERVED
-0.13
/she
-0.13
POSITIVE LOGITS
LLLL
0.27
LY
0.25
YYY
0.25
'S
0.24
YYYY
0.23
’S
0.23
LLL
0.23
ISTIC
0.22
/OR
0.22
OO
0.20
Activations Density 0.077%