INDEX
Explanations
names and familial relationships within the text
New Auto-Interp
Negative Logits
iaux
-0.15
BÃĸL
-0.15
ÃľNİ
-0.15
XPAR
-0.15
λια
-0.14
aea
-0.14
PÅĺ
-0.14
unnable
-0.14
TRGL
-0.13
YYS
-0.13
POSITIVE LOGITS
A
0.44
E
0.39
J
0.36
M
0.36
C
0.34
S
0.32
P
0.31
L
0.30
R
0.29
D
0.28
Activations Density 0.470%