INDEX
Explanations
references to human involvement and social aspects in various contexts
New Auto-Interp
Negative Logits
ikel
-0.14
umba
-0.14
stein
-0.14
šil
-0.14
thal
-0.14
ully
-0.14
grap
-0.14
iano
-0.13
aren
-0.13
aven
-0.13
POSITIVE LOGITS
vester
0.17
eki
0.16
ëłĪ
0.15
rew
0.15
Affero
0.14
atar
0.14
lig
0.14
vur
0.14
ĥ
0.14
пеÑĢÑģ
0.14
Activations Density 0.059%