INDEX
Explanations
mentions of social media platforms and charity-related terms
New Auto-Interp
Negative Logits
enterOuterAlt
-0.48
CloseOperation
-0.47
käyt
-0.45
plegable
-0.40
durata
-0.40
gydy
-0.40
mijne
-0.38
sánh
-0.38
ejus
-0.38
ocasião
-0.38
POSITIVE LOGITS
0.70
neuro
0.70
neuro
0.63
nervous
0.63
Neuro
0.61
0.60
culti
0.60
tweet
0.60
Neuro
0.59
neurological
0.59
Activations Density 0.193%