INDEX
Explanations
statements where agreement or alignment is expressed
expressions of agreement
New Auto-Interp
Negative Logits
sembly
-0.73
asus
-0.71
hyde
-0.68
gallery
-0.67
ccording
-0.66
ò
-0.64
oufl
-0.62
inary
-0.62
agin
-0.61
crow
-0.60
POSITIVE LOGITS
with
0.90
vehemently
0.85
unanimously
0.76
strongly
0.76
SOURCE
0.72
atively
0.70
wi
0.68
WITH
0.66
passionately
0.66
ohn
0.65
Activations Density 0.036%