INDEX
Explanations
statements expressing opinions or evaluations about various topics
opinions and subjective statements expressed in the text
New Auto-Interp
Negative Logits
allegedly
-0.68
hor
-0.66
Written
-0.64
ÄŁ
-0.63
supposed
-0.60
supposedly
-0.59
sham
-0.58
rawdownloadcloneembedreportprint
-0.58
WB
-0.58
!/
-0.58
POSITIVE LOGITS
pole
0.71
uce
0.67
arden
0.67
geist
0.65
cott
0.64
Rampage
0.64
Edward
0.63
congr
0.63
xus
0.63
goodbye
0.62
Activations Density 0.053%