INDEX
Explanations
sentences or phrases indicating agreement with statements or opinions
expressions of agreement or consensus
New Auto-Interp
Negative Logits
gallery
-0.71
asus
-0.68
hyde
-0.63
sembly
-0.61
oufl
-0.58
inary
-0.58
hung
-0.57
ccording
-0.56
emergencies
-0.54
ague
-0.54
POSITIVE LOGITS
vehemently
0.98
with
0.89
WITH
0.78
passionately
0.77
strongly
0.71
unanimously
0.70
lihood
0.70
abl
0.69
wi
0.68
entirely
0.67
Activations Density 0.049%