INDEX
Explanations
direct questions and inquiries
New Auto-Interp
Negative Logits
It
-0.16
The
-0.15
There
-0.14
оно
-0.14
utt
-0.13
ëķĮ문
-0.13
They
-0.13
Its
-0.13
ÄIJó
-0.13
LETE
-0.13
POSITIVE LOGITS
Wh
0.29
cui
0.28
Will
0.27
Who
0.26
Can
0.26
WHO
0.25
Do
0.25
who
0.25
what
0.24
Which
0.24
Activations Density 0.078%