INDEX
Explanations
phrases indicating speculation, understanding, or assurance
expressions of prediction or assumption
New Auto-Interp
Negative Logits
hypers
-0.69
bats
-0.67
raviolet
-0.67
mutants
-0.64
Kik
-0.63
showc
-0.63
Ambro
-0.63
Baltimore
-0.63
ilty
-0.62
decom
-0.61
POSITIVE LOGITS
ħĭ
0.84
idate
0.78
ĵĺ
0.77
uate
0.75
firsthand
0.72
rue
0.68
alogy
0.66
atos
0.65
myself
0.65
confidently
0.64
Activations Density 0.122%