INDEX
Explanations
emotional and societal terms related to personal experiences and societal issues
New Auto-Interp
Negative Logits
Rated
-0.70
SF
-0.65
odore
-0.64
âĢº
-0.62
TF
-0.61
Lod
-0.59
byter
-0.58
Uz
-0.58
uilt
-0.58
esides
-0.58
POSITIVE LOGITS
ousel
0.85
hole
0.83
mentality
0.82
chairs
0.78
politic
0.76
fallacy
0.76
elight
0.74
osphere
0.74
arity
0.73
belt
0.71
Activations Density 0.307%