INDEX
Explanations
phrases related to comparison or contrast
references to the word "well."
New Auto-Interp
Negative Logits
anos
-0.78
pid
-0.65
anon
-0.64
mare
-0.62
ierce
-0.62
anus
-0.61
zan
-0.61
absolute
-0.61
mid
-0.60
agara
-0.60
POSITIVE LOGITS
evidenced
0.61
optionally
0.60
NESS
0.60
onse
0.59
umenthal
0.58
FTWARE
0.58
insofar
0.57
Label
0.57
vers
0.57
possibly
0.56
Activations Density 0.030%