INDEX
Explanations
references to evidence or validation of claims
New Auto-Interp
Negative Logits
/Area
-0.14
invert
-0.14
prene
-0.14
ê»ĺ
-0.14
quirer
-0.14
à¥Ģन
-0.14
kir
-0.14
odge
-0.14
urs
-0.14
uteur
-0.14
POSITIVE LOGITS
reading
0.34
reader
0.28
ed
0.24
positive
0.23
iness
0.22
read
0.22
-positive
0.22
Positive
0.20
ing
0.20
READING
0.20
Activations Density 0.017%