INDEX
Explanations
phrases indicating size, quality, or specifics of substances and experiences
New Auto-Interp
Negative Logits
instance
-0.15
thing
-0.15
iej
-0.14
UBLE
-0.14
iente
-0.14
omaly
-0.14
olate
-0.14
aday
-0.14
ffects
-0.14
ÑĩиÑģл
-0.14
POSITIVE LOGITS
vengeance
0.34
emphasis
0.32
twist
0.28
Twist
0.28
focus
0.27
regards
0.25
bang
0.24
bang
0.23
regard
0.23
emphasis
0.23
Activations Density 0.103%