INDEX
Explanations
phrases indicating approval or acceptance
expressions conveying approval or acceptance
New Auto-Interp
Negative Logits
marine
-0.86
chet
-0.79
hani
-0.77
riber
-0.68
cano
-0.68
ulhu
-0.68
arcity
-0.66
raped
-0.63
bane
-0.62
quin
-0.62
POSITIVE LOGITS
AY
0.83
lahoma
0.78
okay
0.78
alright
0.71
ably
0.68
ol
0.66
BUT
0.66
enough
0.65
ok
0.64
OK
0.64
Activations Density 0.023%