INDEX
Explanations
sentences referring to praise or criticism
sentences or statements that convey a sense of completion or finality
New Auto-Interp
Negative Logits
uer
-0.89
oun
-0.77
mate
-0.70
consolation
-0.68
isher
-0.68
equival
-0.68
broader
-0.67
uers
-0.67
nons
-0.66
hay
-0.66
POSITIVE LOGITS
Especially
1.22
Firstly
1.14
Its
1.11
Whereas
1.08
Literally
1.07
Whether
1.04
Whilst
1.03
Particularly
1.01
Having
1.00
Typically
0.97
Activations Density 0.566%