INDEX
Explanations
phrases indicating the beginning or completion of a task or discussion
phrases related to being out of the ordinary or unconventional situations
New Auto-Interp
Negative Logits
utical
-0.69
raud
-0.66
cum
-0.65
ãĥį
-0.65
lege
-0.63
ILA
-0.63
ume
-0.62
OK
-0.61
oster
-0.61
olo
-0.61
POSITIVE LOGITS
equation
1.02
gate
0.98
frying
0.90
closet
0.89
loop
0.84
box
0.84
gates
0.83
woods
0.81
realm
0.80
fray
0.77
Activations Density 0.063%