INDEX
Explanations
words related to neutrality or lack of bias
content related to neutrality and its various implications
New Auto-Interp
Negative Logits
KER
-0.73
monary
-0.72
hower
-0.71
teenth
-0.66
teen
-0.64
inary
-0.64
Requires
-0.62
buck
-0.62
painfully
-0.62
urrent
-0.62
POSITIVE LOGITS
confines
0.90
zone
0.87
toward
0.83
stance
0.80
reception
0.79
towards
0.79
glers
0.77
environment
0.77
demeanor
0.74
matchups
0.71
Activations Density 0.078%