INDEX
Explanations
references to food items
New Auto-Interp
Negative Logits
indywidual
-0.48
eenvoudig
-0.45
eenvou
-0.43
güçlü
-0.41
iertamente
-0.41
prakty
-0.40
przede
-0.39
fisik
-0.39
berbeda
-0.39
simplement
-0.39
POSITIVE LOGITS
fucking
0.93
goddamn
0.91
hipster
0.85
fuckin
0.85
shitty
0.79
apocalypse
0.79
motherfucker
0.78
hilar
0.78
Cyfarwyddwr
0.78
fucking
0.78
Activations Density 1.833%