INDEX
Explanations
expressions of uncertainty or questions about knowledge
I don't know followed by uncertainty
New Auto-Interp
Negative Logits
complexContent
-0.42
posedge
-0.38
autorytatywna
-0.35
ьаж
-0.35
ScopeManager
-0.34
iální
-0.34
ensured
-0.33
voul
-0.33
tartalomajánló
-0.32
volon
-0.32
POSITIVE LOGITS
dunno
0.93
Dunno
0.90
IDK
0.82
idk
0.81
Idk
0.80
idk
0.72
Idk
0.70
不知道
0.65
我不知道
0.65
unknowns
0.64
Activations Density 0.012%