INDEX
Explanations
references to the name "Margaret."
New Auto-Interp
Negative Logits
ivil
-0.17
cơ
-0.17
ernet
-0.16
andas
-0.15
Meadows
-0.15
кÑĤа
-0.15
enburg
-0.15
hall
-0.14
651
-0.14
langs
-0.14
POSITIVE LOGITS
son
0.19
amma
0.16
tright
0.15
oren
0.15
orc
0.15
otos
0.15
ty
0.15
cone
0.14
boro
0.14
utes
0.14
Activations Density 0.005%