INDEX
Explanations
statements expressing opinions or beliefs
assertions and claims about opinions or beliefs
New Auto-Interp
Negative Logits
hander
-0.71
ritten
-0.68
phrine
-0.68
ãĤ´ãĥ³
-0.67
atri
-0.67
bid
-0.67
written
-0.67
recorded
-0.65
pin
-0.64
scanned
-0.63
POSITIVE LOGITS
otherwise
1.18
disrespect
0.87
excuses
0.79
anything
0.76
baseless
0.73
unreasonable
0.70
incorrectly
0.69
dismissing
0.68
neglect
0.68
criticizing
0.67
Activations Density 0.271%