INDEX
Explanations
terms related to negative impacts or harm caused by policies or actions
New Auto-Interp
Negative Logits
ignon
-0.19
RLF
-0.17
Animalia
-0.16
ennon
-0.16
crit
-0.15
WidgetItem
-0.14
cob
-0.14
ebek
-0.14
ZR
-0.14
cis
-0.14
POSITIVE LOGITS
æİī
0.20
McM
0.16
mec
0.16
Å¡ÃŃ
0.16
aken
0.15
ogue
0.15
efforts
0.15
597
0.14
Matth
0.14
ahir
0.14
Activations Density 0.138%