INDEX
Explanations
adverbs expressing a level of certainty or confidence
phrases indicating agreement or affirmation
New Auto-Interp
Negative Logits
hend
-0.77
ourses
-0.73
endant
-0.65
endants
-0.65
ert
-0.64
¿½
-0.62
Application
-0.62
vertisement
-0.62
(\
-0.61
heid
-0.60
POSITIVE LOGITS
kidding
0.84
literally
0.76
boring
0.65
REALLY
0.65
shit
0.64
Asgard
0.63
blah
0.62
exc
0.62
disposable
0.62
cursed
0.61
Activations Density 0.260%