INDEX
Explanations
references to moral struggles between good and evil
New Auto-Interp
Negative Logits
Надо
-0.92
somebody
-0.84
cioè
-0.84
wierd
-0.82
stuff
-0.82
stupid
-0.80
Надо
-0.79
Somebody
-0.79
whatever
-0.76
idéia
-0.75
POSITIVE LOGITS
impactful
0.86
prior
0.85
utilizing
0.84
utilizes
0.79
utilize
0.78
אשר
0.77
utilized
0.74
showcasing
0.72
welcher
0.72
transitioning
0.70
Activations Density 1.647%