INDEX
Explanations
opinions or attitudes of approval or disapproval
New Auto-Interp
Negative Logits
Expansion
-0.69
eor
-0.67
oulos
-0.64
OPE
-0.64
perty
-0.64
ILA
-0.63
PT
-0.63
ropolitan
-0.62
senal
-0.61
Examination
-0.60
POSITIVE LOGITS
entimes
0.84
cast
0.83
ling
0.78
nered
0.78
erd
0.76
hearted
0.76
lier
0.75
glers
0.75
entially
0.74
ados
0.72
Activations Density 0.017%