INDEX
Explanations
concepts related to emotional intensity and personal struggles
New Auto-Interp
Negative Logits
stuff
-0.19
sti
-0.16
arta
-0.14
evidence
-0.14
onso
-0.14
ongs
-0.14
emand
-0.14
ilos
-0.14
analogy
-0.14
ئ
-0.13
POSITIVE LOGITS
karÅŁ
0.15
çĽĺ
0.15
few
0.15
greater
0.14
boom
0.14
orama
0.14
preferredStyle
0.14
xic
0.14
azer
0.14
Speak
0.14
Activations Density 0.419%