INDEX
Explanations
phrases indicating a lack of knowledge or understanding
expressions of uncertainty or lack of knowledge
New Auto-Interp
Negative Logits
semble
-0.77
nette
-0.74
visor
-0.72
inka
-0.71
cedes
-0.71
rup
-0.71
ĪĴ
-0.70
ammers
-0.70
conservancy
-0.70
raint
-0.68
POSITIVE LOGITS
whatsoever
1.07
èª
0.81
squat
0.78
guessing
0.71
guesses
0.70
how
0.66
whats
0.66
UTH
0.65
WATCHED
0.64
glimps
0.64
Activations Density 0.026%