INDEX
Explanations
negative expressions and descriptions of discomfort or adversity
New Auto-Interp
Negative Logits
ignon
-0.16
utton
-0.16
UTTON
-0.16
Dent
-0.15
fav
-0.14
Dialogue
-0.14
Ire
-0.13
ìłľ
-0.13
Dialog
-0.13
ets
-0.13
POSITIVE LOGITS
hack
0.17
jean
0.16
hack
0.16
kola
0.15
ivid
0.15
hacks
0.15
ECTOR
0.14
Struct
0.14
aces
0.14
EO
0.14
Activations Density 1.474%