INDEX
Explanations
phrases indicating clarity or certainty
instances of the word "obvious."
New Auto-Interp
Negative Logits
nan
-0.86
rams
-0.77
iership
-0.74
tightly
-0.66
ILCS
-0.64
ingers
-0.64
monitored
-0.63
ander
-0.63
ching
-0.62
uden
-0.61
POSITIVE LOGITS
obvious
1.03
iary
0.83
contrad
0.73
Leilan
0.73
tale
0.73
Ùĩ
0.72
culprit
0.71
signs
0.70
\\\\\\\\
0.70
Signs
0.69
Activations Density 0.009%