INDEX
Explanations
phrases or questions expressing uncertainty or curiosity about a situation
New Auto-Interp
Negative Logits
hoffe
-0.70
jaus
-0.64
OfYear
-0.63
føl
-0.62
mostrarse
-0.61
niž
-0.61
vuitton
-0.58
Klass
-0.58
ptăm
-0.57
bevis
-0.57
POSITIVE LOGITS
What
0.99
What
0.98
what
0.96
what
0.96
WHAT
0.94
WHAT
0.93
AndEndTag
0.77
GEBURTSDATUM
0.76
DECREF
0.67
OMITBAD
0.66
Activations Density 0.129%