INDEX
Explanations
expressions of surprise, skepticism, or emotional reactions
New Auto-Interp
Negative Logits
ngth
-0.76
elta
-0.74
ictionary
-0.70
nomine
-0.69
semble
-0.67
rive
-0.66
rou
-0.66
erville
-0.65
reau
-0.65
ioxide
-0.65
POSITIVE LOGITS
imaru
0.93
considering
0.93
seeing
0.86
why
0.72
how
0.69
Stras
0.66
why
0.65
because
0.63
SPONSORED
0.62
because
0.62
Activations Density 0.290%