INDEX
Explanations
phrases related to self-expression and personal beliefs
sentences that express statements or assertions
New Auto-Interp
Negative Logits
kicking
-0.58
departing
-0.58
kinson
-0.57
nen
-0.55
kindly
-0.55
ongo
-0.53
captcha
-0.53
omen
-0.52
kicks
-0.52
ema
-0.52
POSITIVE LOGITS
Finally
1.03
And
0.99
Lastly
0.97
He
0.89
etc
0.87
They
0.84
Heck
0.80
Finally
0.79
And
0.78
etc
0.78
Activations Density 0.375%