INDEX
Explanations
statements about actions and attributes of individuals, particularly focusing on their capabilities and experiences
New Auto-Interp
Negative Logits
Blech
-0.52
luogo
-0.49
рс
-0.49
oad
-0.49
UVWXYZ
-0.48
Unione
-0.48
yourselves
-0.48
stalo
-0.48
başlad
-0.47
jíma
-0.47
POSITIVE LOGITS
himself
1.71
Himself
1.36
himself
1.35
his
1.13
His
0.94
his
0.93
He
0.93
His
0.92
因为他
0.90
的他
0.90
Activations Density 0.647%