INDEX
Explanations
phrases indicating certainty or importance associated with concepts or statements
New Auto-Interp
Negative Logits
çķ¥
-0.15
almost
-0.15
encion
-0.15
idir
-0.14
hazi
-0.14
ÃŃt
-0.14
almost
-0.14
ais
-0.14
halt
-0.14
stalk
-0.14
POSITIVE LOGITS
necessarily
0.24
coincidence
0.23
particularly
0.23
stretch
0.21
terribly
0.21
matter
0.20
problem
0.20
anymore
0.20
issue
0.19
isolated
0.19
Activations Density 0.064%