INDEX
Explanations
mentions of specific individuals and their roles or actions
New Auto-Interp
Negative Logits
sahiptir
-0.54
bleef
-0.52
göre
-0.50
terdengar
-0.50
GOT
-0.48
מיד
-0.47
descobri
-0.47
mourut
-0.46
gevoel
-0.46
意思是
-0.46
POSITIVE LOGITS
unfold
1.01
coming
0.84
emerge
0.82
evolve
0.82
perform
0.76
operate
0.75
grow
0.74
come
0.73
happen
0.72
unfolding
0.72
Activations Density 0.209%