INDEX
Explanations
references to the act of reading or engagement with books
New Auto-Interp
Negative Logits
usat
-0.17
outil
-0.16
æ¨
-0.16
cycle
-0.15
ssf
-0.15
zem
-0.15
atif
-0.15
aho
-0.15
wid
-0.14
ledi
-0.14
POSITIVE LOGITS
aspers
0.19
ownik
0.16
igu
0.16
å¿ł
0.16
oki
0.15
As
0.15
iska
0.15
каз
0.15
ilty
0.15
Byl
0.14
Activations Density 0.030%