INDEX
Explanations
expressions of confusion or frustration in conversations
New Auto-Interp
Negative Logits
ha
-0.15
ù
-0.14
facult
-0.14
(?)
-0.14
591
-0.14
Ramsey
-0.13
amd
-0.13
Gale
-0.13
prox
-0.13
ham
-0.13
POSITIVE LOGITS
shit
0.59
crap
0.54
shit
0.48
crap
0.43
garbage
0.42
rubbish
0.40
BS
0.39
sh
0.35
junk
0.35
nonsense
0.35
Activations Density 0.246%