INDEX
Explanations
punctuation that indicates surprise or emphasis
rhetorical questions and expressions of incredulity
New Auto-Interp
Negative Logits
emaker
-0.84
oreal
-0.83
izons
-0.77
onds
-0.73
20439
-0.72
nai
-0.68
pkg
-0.68
arus
-0.68
alyses
-0.66
ema
-0.65
POSITIVE LOGITS
Pry
0.73
Fantastic
0.72
Percy
0.66
Anyway
0.65
guess
0.65
fuck
0.65
Hick
0.64
Sammy
0.64
swear
0.63
Pf
0.63
Activations Density 0.322%