INDEX
Explanations
descriptions of properties and reflections about experiences or observations
New Auto-Interp
Negative Logits
κά
-0.15
obel
-0.15
392
-0.15
igo
-0.15
nesc
-0.15
baugh
-0.14
IGO
-0.14
enko
-0.14
.ds
-0.14
aro
-0.14
POSITIVE LOGITS
rather
0.20
than
0.19
rather
0.18
Rather
0.17
eyh
0.15
Rather
0.15
ddl
0.15
li
0.15
-than
0.15
FY
0.15
Activations Density 0.183%