INDEX
Explanations
positive affirmations after interaction
New Auto-Interp
Negative Logits
forgets
0.85
nightmares
0.78
pitfalls
0.76
horrors
0.72
delicacies
0.72
misschien
0.71
luoghi
0.71
olvides
0.71
بسه
0.71
специфи
0.70
POSITIVE LOGITS
seeing
1.13
glad
1.04
glad
1.03
видеть
0.99
Glad
0.98
Seeing
0.95
Hearing
0.95
seeing
0.93
haber
0.93
Glad
0.92
Activations Density 0.197%