INDEX
Explanations
statements related to ethical or moral considerations
New Auto-Interp
Negative Logits
ager
-0.65
jug
-0.65
zan
-0.62
lieutenant
-0.61
jah
-0.60
agers
-0.59
ãĥĩãĤ£
-0.59
aron
-0.59
cradle
-0.59
Ranch
-0.58
POSITIVE LOGITS
etc
1.29
etc
1.15
Basically
0.93
Lastly
0.88
thereof
0.85
[/
0.84
Similarly
0.83
ect
0.83
Additionally
0.82
These
0.80
Activations Density 0.054%