INDEX
Explanations
references to social media platforms and related actions
New Auto-Interp
Negative Logits
esprits
-0.47
preuves
-0.47
pouvoirs
-0.46
presence
-0.45
manos
-0.45
coordonnées
-0.43
Handeln
-0.42
symboles
-0.42
lecteurs
-0.42
Gründe
-0.42
POSITIVE LOGITS
Modal
0.53
Princip
0.48
Prin
0.47
cared
0.46
modal
0.46
aragus
0.46
princip
0.45
Mom
0.44
Mechan
0.44
]^{-0.44
Activations Density 0.113%