INDEX
Explanations
Nelson Mandela, Jackie Robinson
New Auto-Interp
Negative Logits
in
0.88
t
0.83
i
0.78
↵↵
0.77
v
0.76
kesk
0.75
id
0.73
r
0.71
ק
0.70
jurul
0.67
POSITIVE LOGITS
Pressed
0.64
Фургала
0.61
textepsilon
0.60
ajte
0.59
<unused601>
0.59
Mord
0.58
Виктор
0.58
Shivam
0.58
Forge
0.56
젝트
0.56
Activations Density 0.001%