INDEX
Explanations
mentions of a specific person named Kim
mentions of the name "Kim."
New Auto-Interp
Negative Logits
Downloadha
-0.83
ttes
-0.69
ALLY
-0.69
OUGH
-0.69
BACK
-0.66
UCT
-0.65
Ú
-0.64
naires
-0.64
514
-0.63
̶
-0.61
POSITIVE LOGITS
Jong
1.14
Kardashian
1.04
oji
1.01
ball
0.96
pton
0.96
aeper
0.94
sey
0.86
py
0.84
ney
0.84
ãĤ§
0.83
Activations Density 0.019%