INDEX
Explanations
magical or mythical elements in discussions of concepts or items
New Auto-Interp
Negative Logits
itself
-0.76
is
-0.70
was
-0.67
kuris
-0.66
которому
-0.66
itself
-0.66
its
-0.64
Its
-0.61
tagHelper
-0.59
himself
-0.58
POSITIVE LOGITS
themselves
1.52
themselves
1.30
cherchés
1.15
are
1.03
were
0.90
jotka
0.87
eds
0.83
themſelves
0.83
אלה
0.80
those
0.79
Activations Density 3.916%