INDEX
Explanations
discussions about societal improvements and the importance of education and citizen well-being
New Auto-Interp
Negative Logits
atron
-0.15
овеÑĢ
-0.15
reib
-0.15
coni
-0.15
rieg
-0.15
दर
-0.14
artner
-0.14
stacle
-0.14
lluminate
-0.14
Appear
-0.14
POSITIVE LOGITS
instead
0.28
Instead
0.23
Instead
0.23
instead
0.21
alternative
0.19
more
0.18
other
0.17
alternate
0.16
more
0.15
emphasis
0.15
Activations Density 0.159%