INDEX
Explanations
expressions of surprise or frustration
New Auto-Interp
Negative Logits
незавершена
-0.84
iſt
-0.69
ſy
-0.68
^(@)
-0.68
vuitton
-0.68
NDEBUG
-0.65
]-->
-0.65
idéia
-0.64
AssemblyCulture
-0.61
'\\;'
-0.61
POSITIVE LOGITS
fucking
0.71
FUCKING
0.66
Fucking
0.60
lmao
0.60
goddamn
0.59
fucking
0.59
fuckin
0.57
realisation
0.57
Fucking
0.56
Fuck
0.56
Activations Density 0.141%