INDEX
Explanations
phrases indicating time duration or frequency related to events or actions
New Auto-Interp
Negative Logits
anks
-0.15
anker
-0.14
iez
-0.13
rench
-0.13
pus
-0.13
iais
-0.13
Beaver
-0.12
Kai
-0.12
né
-0.12
anske
-0.12
POSITIVE LOGITS
-a
0.85
Ãł
0.59
.a
0.47
_a
0.45
а
0.45
Ãł
0.44
/a
0.41
+a
0.40
=a
0.40
ÃĢ
0.39
Activations Density 0.149%