INDEX
Explanations
sentences that convey conditional reasoning or personal insights
New Auto-Interp
Negative Logits
skall
-0.78
muß
-0.75
läßt
-0.74
・・・・・
-0.63
denominado
-0.61
であるが
-0.58
lecz
-0.57
mußte
-0.57
yoktur
-0.56
آنان
-0.56
POSITIVE LOGITS
shitty
1.08
tryna
1.08
tbh
1.07
weirdly
1.06
whatnot
1.04
idk
1.04
fucked
1.02
goddamn
1.00
lemme
0.98
kinda
0.98
Activations Density 1.698%