INDEX
Explanations
phrases indicating agreement or alignment with a position or statement
expressions of agreement
New Auto-Interp
Negative Logits
oufl
-0.74
glands
-0.73
crow
-0.71
Tycoon
-0.69
typh
-0.69
vas
-0.66
Blooming
-0.65
tremend
-0.60
brid
-0.58
Mongolia
-0.58
POSITIVE LOGITS
rences
0.84
unanimously
0.82
reements
0.80
ipeg
0.80
agree
0.78
ably
0.76
lihood
0.76
reement
0.75
agreement
0.74
ettle
0.73
Activations Density 0.024%