INDEX
Explanations
phrases related to societal issues or criticisms
New Auto-Interp
Negative Logits
CLASSIFIED
-0.73
mask
-0.64
TIME
-0.64
ALSE
-0.63
%%
-0.61
PLAY
-0.61
lasted
-0.60
ATURES
-0.60
#$
-0.60
FIG
-0.60
POSITIVE LOGITS
favor
1.53
favour
1.34
lieu
1.29
order
1.19
efficiency
1.15
spite
1.15
vitro
1.11
accordance
1.11
effic
1.11
anticipation
1.03
Activations Density 0.336%