INDEX
Explanations
verbs and their variations indicating actions or states regarding societal issues, particularly concerning injustice or significant actions
New Auto-Interp
Negative Logits
erk
-0.16
Sav
-0.16
Ľ
-0.15
grantResults
-0.15
.mj
-0.15
ëĭĿ
-0.15
еÑĢп
-0.14
Nag
-0.14
æ¥ļ
-0.14
arms
-0.14
POSITIVE LOGITS
odox
0.17
Levin
0.16
centrif
0.15
RIA
0.14
ient
0.14
chwitz
0.14
sein
0.14
Env
0.13
heet
0.13
env
0.13
Activations Density 0.377%