INDEX
Explanations
comparisons and evaluations in text
comparisons or contrasts between different subjects or ideas
New Auto-Interp
Negative Logits
interrupted
-0.71
Leod
-0.71
arten
-0.67
qui
-0.66
dimension
-0.66
iol
-0.65
panic
-0.63
hes
-0.60
arching
-0.57
letters
-0.57
POSITIVE LOGITS
considering
0.86
Gaw
0.73
honestly
0.73
}:
0.70
Especially
0.70
especially
0.69
imagine
0.67
folks
0.66
WM
0.66
compared
0.66
Activations Density 0.451%