INDEX
Explanations
specific patterns or terms related to specific variables in a structured or experimental context
New Auto-Interp
Negative Logits
faſt
-0.86
windowFixed
-0.84
Efq
-0.83
houſe
-0.79
thâu
-0.77
Houſe
-0.77
purpoſe
-0.77
Jefus
-0.77
ſever
-0.74
ſta
-0.74
POSITIVE LOGITS
SequentialGroup
0.58
Vidite
0.54
Datuak
0.52
all
0.52
…
0.49
the
0.48
[…]
0.48
l
0.46
he
0.46
data
0.46
Activations Density 0.001%