INDEX
Explanations
phrases indicating certainty or strong possibility
statements indicating certainty or conclusions about various topics
New Auto-Interp
Negative Logits
ãĥīãĥ©
-0.78
arak
-0.74
à¦
-0.68
mouth
-0.67
usk
-0.67
andem
-0.67
rack
-0.67
Zone
-0.66
oses
-0.66
cience
-0.65
POSITIVE LOGITS
whoever
1.02
there
0.81
these
0.78
somebody
0.75
someone
0.75
they
0.74
nobody
0.73
although
0.69
we
0.69
none
0.69
Activations Density 0.211%