INDEX
Explanations
assertions or statements of understanding related to various subjects or situations
New Auto-Interp
Negative Logits
ÙĨاÙĨ
-0.16
cott
-0.15
lier
-0.14
вий
-0.14
justice
-0.14
branch
-0.14
šk
-0.14
Justice
-0.14
pp
-0.14
eller
-0.14
POSITIVE LOGITS
needs
0.17
sometimes
0.16
needs
0.15
bands
0.15
ideshow
0.14
need
0.14
sometimes
0.14
985
0.14
Needs
0.14
amilia
0.13
Activations Density 0.057%