INDEX
Explanations
concepts related to the evaluation and classification of claims and contributions based on power dynamics and societal structures
New Auto-Interp
Negative Logits
enan
-0.15
allon
-0.14
nowrap
-0.14
legg
-0.14
¼
-0.14
åįĵ
-0.14
edef
-0.13
_runner
-0.13
discontin
-0.13
zes
-0.13
POSITIVE LOGITS
ç¨ĭ度
0.30
degree
0.29
degree
0.26
depending
0.25
extent
0.24
level
0.23
Degree
0.22
degrees
0.21
extent
0.21
Degree
0.21
Activations Density 0.256%