INDEX
Explanations
sentences discussing beliefs, opinions, or affirmations
indications of opinion or belief related to various social and political issues
New Auto-Interp
Negative Logits
Pwr
-0.52
cellaneous
-0.49
iasm
-0.49
cour
-0.48
flurry
-0.47
<-
-0.47
âĶľâĶĢâĶĢ
-0.47
Explain
-0.46
partName
-0.46
:=
-0.45
POSITIVE LOGITS
unfairly
0.61
beneficial
0.57
genuine
0.57
sufficiently
0.55
irreversible
0.54
somehow
0.53
misunderstood
0.53
inappropriately
0.53
undermin
0.52
undes
0.51
Activations Density 1.701%