INDEX
Explanations
expressions of disbelief or skepticism
New Auto-Interp
Negative Logits
mnoho
-0.68
lecz
-0.60
gyermek
-0.59
bowiem
-0.56
indeed
-0.56
lamang
-0.54
許多
-0.54
perhaps
-0.54
indeed
-0.53
许多
-0.52
POSITIVE LOGITS
dude
1.17
dudes
1.14
weird
1.00
kinda
1.00
guy
0.96
guys
0.94
GUYS
0.90
Dude
0.90
freaked
0.89
fuckin
0.88
Activations Density 0.457%