INDEX
Explanations
names or terms emphasizing importance or significance
instances of absence and presence in context
New Auto-Interp
Negative Logits
roit
-0.63
\'
-0.59
liv
-0.59
tempered
-0.57
Narr
-0.57
nih
-0.56
unequ
-0.54
heid
-0.52
ternity
-0.52
ãĥ´ãĤ¡
-0.52
POSITIVE LOGITS
is
1.06
are
0.94
was
0.87
involves
0.81
include
0.70
relates
0.68
were
0.68
lies
0.65
Is
0.64
is
0.63
Activations Density 0.690%