INDEX
Explanations
words related to hypothetical situations, past actions, and desires
life, hypothetical scenarios
New Auto-Interp
Negative Logits
itſelf
-0.88
ſelf
-0.81
Jefus
-0.79
pleaſure
-0.76
ViewFeatures
-0.76
ſelves
-0.75
ględ
-0.74
Theſe
-0.73
faſt
-0.72
maßen
-0.70
POSITIVE LOGITS
am
0.59
noch
0.53
kund
0.49
yourself
0.48
ander
0.48
Myself
0.48
myself
0.47
ffin
0.47
Te
0.46
bild
0.46
Activations Density 4.215%