INDEX
Explanations
expressions of freedom and choice in various contexts
New Auto-Interp
Negative Logits
Kle
-0.18
akat
-0.16
oleÄį
-0.16
ikan
-0.15
rada
-0.15
lag
-0.14
adr
-0.14
cla
-0.14
lez
-0.14
lap
-0.14
POSITIVE LOGITS
isse
0.18
imbus
0.15
Diss
0.15
ttl
0.15
енÑģ
0.14
Tatto
0.14
ownload
0.14
ewise
0.14
ãģĸ
0.14
igor
0.14
Activations Density 0.010%