INDEX
Explanations
specific statements, opinions, or arguments made by individuals
assertions about societal responsibilities and cultural critiques
New Auto-Interp
Negative Logits
partName
-0.55
Pwr
-0.51
Attempts
-0.51
Spot
-0.49
Details
-0.49
Laboratory
-0.48
Magicka
-0.48
NAD
-0.48
odder
-0.47
sqor
-0.47
POSITIVE LOGITS
"'
0.65
homosexuals
0.64
persecuted
0.63
undermin
0.63
immoral
0.62
somehow
0.61
rapists
0.60
"â̦
0.60
oppress
0.57
"#
0.57
Activations Density 1.337%