INDEX
Explanations
phrases indicating determination or decision
New Auto-Interp
Negative Logits
aples
-0.67
ernels
-0.66
WIND
-0.61
archives
-0.60
ilers
-0.59
Kis
-0.58
wine
-0.58
heed
-0.57
asus
-0.57
illi
-0.57
POSITIVE LOGITS
anywhere
1.32
anymore
1.10
nor
0.90
slightest
0.87
anytime
0.86
any
0.84
overboard
0.84
whatsoever
0.81
unnoticed
0.80
lightly
0.77
Activations Density 0.048%