INDEX
Explanations
phrases or sentences that indicate strong emphasis or comparison
New Auto-Interp
Negative Logits
axter
-0.65
ãĤ¿
-0.63
kefeller
-0.63
bath
-0.63
icides
-0.61
kai
-0.61
HCR
-0.61
iday
-0.60
tions
-0.58
undo
-0.58
POSITIVE LOGITS
partially
0.78
partly
0.78
SOME
0.73
uner
0.69
toler
0.68
pretend
0.67
temporarily
0.66
hap
0.65
theoretically
0.65
lik
0.65
Activations Density 1.122%