INDEX
Explanations
references to cognitive processes and reasoning
New Auto-Interp
Negative Logits
vettore
-0.71
vettoriale
-0.60
meliharaan
-0.60
ampi
-0.58
őket
-0.58
popolari
-0.57
ponym
-0.56
riconoscimento
-0.56
териалы
-0.54
Награды
-0.54
POSITIVE LOGITS
thinking
3.47
Thinking
3.23
Thinking
3.06
thinking
3.00
thinkers
2.28
THINK
2.25
thinker
2.13
thought
2.06
Thought
2.06
thoughts
2.04
Activations Density 0.092%