INDEX
Explanations
expressions related to thought processes and reflections
New Auto-Interp
Negative Logits
er
-0.73
drücken
-0.69
FilterChain
-0.69
5
-0.67
legungen
-0.67
1
-0.66
тельству
-0.65
ers
-0.64
anlagen
-0.64
рыва
-0.64
POSITIVE LOGITS
Thought
1.31
THOUGHT
1.30
thought
1.24
Thought
1.19
thought
1.09
thoughts
1.06
thoughts
1.02
SOT
0.98
Thoughts
0.98
Manbalar
0.86
Activations Density 0.059%