INDEX
Explanations
phrases indicating frequency or habitual actions
New Auto-Interp
Negative Logits
ters
-0.15
atura
-0.15
ura
-0.15
unnel
-0.15
uras
-0.15
вдÑĢÑĥг
-0.15
ç¥Ŀ
-0.14
/-
-0.14
ÑijÑĢ
-0.14
rowsable
-0.14
POSITIVE LOGITS
Cons
0.17
ulin
0.16
nero
0.14
cons
0.14
Cons
0.14
oleon
0.14
597
0.14
afa
0.14
λαν
0.13
rics
0.13
Activations Density 0.030%