INDEX
Explanations
phrases related to groups of people, especially in contexts suggesting a degree of seriousness or emphasis
New Auto-Interp
Negative Logits
préférence
-0.56
leby
-0.53
către
-0.52
partiellement
-0.51
ailleurs
-0.50
scurt
-0.46
ägg
-0.46
jednocześnie
-0.45
difficultés
-0.44
chrétiens
-0.44
POSITIVE LOGITS
outta
0.77
damn
0.73
really
0.73
absolutely
0.69
REALLY
0.68
!
0.64
Absolutely
0.62
!',
0.62
fuckin
0.62
motherfucker
0.61
Activations Density 0.207%