INDEX
Explanations
references to dolls
mentions of dolls
New Auto-Interp
Negative Logits
EED
-0.81
TAIN
-0.74
ãĥ¥
-0.72
NING
-0.68
GGGGGGGG
-0.65
RAW
-0.64
riott
-0.64
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-0.61
forc
-0.61
Unified
-0.61
POSITIVE LOGITS
doll
1.10
dolls
1.07
maker
0.95
wright
0.87
ophone
0.82
Doll
0.80
endor
0.78
figur
0.77
oru
0.76
makers
0.76
Activations Density 0.019%