INDEX
Explanations
phrases related to personal opinions or reactions
phrases that emphasize personal opinions or statements
New Auto-Interp
Negative Logits
neau
-0.72
ãĤ´ãĥ³
-0.70
ocaust
-0.70
imeter
-0.68
fee
-0.67
estones
-0.65
oided
-0.63
ammers
-0.62
enne
-0.62
orthy
-0.61
POSITIVE LOGITS
why
1.44
what
1.28
how
1.24
where
1.05
WHY
1.00
whats
1.00
what
0.97
exactly
0.91
why
0.89
gonna
0.80
Activations Density 0.105%