INDEX
Explanations
sentences expressing certainty or clarity
phrases indicating certainty or clarity about a situation
New Auto-Interp
Negative Logits
mouth
-0.80
aukee
-0.78
andem
-0.76
cience
-0.68
rack
-0.65
izont
-0.63
uminati
-0.61
UV
-0.60
utorial
-0.60
aciously
-0.60
POSITIVE LOGITS
whoever
0.92
there
0.76
none
0.69
omission
0.69
these
0.69
nobody
0.68
although
0.67
someone
0.66
we
0.65
they
0.65
Activations Density 0.189%