INDEX
Explanations
lieutenant followed by rank
New Auto-Interp
Negative Logits
as
-3.33
In
-2.95
u
-2.88
N
-2.86
an
-2.86
in
-2.80
2
-2.61
J
-2.58
er
-2.55
Q
-2.55
POSITIVE LOGITS
蜮
2.58
'
2.55
🅃
2.53
🅣
2.42
芣
2.38
𓁹
2.30
颏
2.22
2.19
圌
2.17
𝕿
2.13
Activations Density 0.004%