INDEX
Explanations
phrases related to evaluation or comparison
phrases indicating subjective opinions or beliefs
New Auto-Interp
Negative Logits
Reviewer
-0.90
Beautiful
-0.64
cz
-0.63
superb
-0.62
unim
-0.61
undefeated
-0.60
wonder
-0.60
irresistible
-0.59
ellen
-0.59
uries
-0.59
POSITIVE LOGITS
referring
1.23
refers
1.03
referencing
1.03
meant
1.03
merely
0.97
signify
0.91
mean
0.87
implying
0.85
just
0.83
refer
0.83
Activations Density 0.819%