INDEX
Explanations
expressions of dislike or negative feelings towards subjects or concepts
New Auto-Interp
Negative Logits
ally
-0.15
essler
-0.15
cap
-0.15
orsch
-0.15
612
-0.14
aily
-0.14
pose
-0.14
Rica
-0.14
ÏĨÏīν
-0.14
245
-0.14
POSITIVE LOGITS
rank
0.16
afen
0.14
akter
0.14
unist
0.14
Shapiro
0.14
ÐĴики
0.14
bounce
0.13
.preview
0.13
tuá»ķi
0.13
askell
0.13
Activations Density 0.023%