INDEX
Explanations
phrases indicating clear conclusions or evaluations
instances of the word "clearly" emphasizing transparency or obviousness in statements
New Auto-Interp
Negative Logits
uese
-0.79
aily
-0.76
oleon
-0.70
anish
-0.68
umption
-0.68
hell
-0.68
awaru
-0.68
lav
-0.67
urch
-0.67
rost
-0.67
POSITIVE LOGITS
deline
0.97
marked
0.84
identifiable
0.82
distinguish
0.80
differentiated
0.78
readable
0.75
articulated
0.74
outwe
0.74
differentiate
0.74
spelled
0.73
Activations Density 0.027%