INDEX
Explanations
strong language and expressions of frustration or anger
New Auto-Interp
Negative Logits
lähe
-0.74
avulla
-0.72
::~
-0.71
كومونز
-0.70
viewType
-0.70
ExtendWith
-0.69
lewati
-0.66
vastaan
-0.65
hadapi
-0.65
Rial
-0.64
POSITIVE LOGITS
fucking
1.08
fuck
1.01
damn
0.98
fucking
0.98
goddamn
0.96
fuck
0.95
Fucking
0.94
Fucking
0.94
FUCK
0.94
Fuck
0.93
Activations Density 0.094%