INDEX
Explanations
uncertainty and inquiries about knowledge or understanding
asking what or why
New Auto-Interp
Negative Logits
TagMode
-0.69
ſelf
-0.68
gynhyrchwyd
-0.67
featureID
-0.66
queſta
-0.66
SharedDtor
-0.65
ligiloj
-0.65
Houſe
-0.64
surla
-0.63
itinéraire
-0.63
POSITIVE LOGITS
what
1.70
what
1.30
What
1.27
What
1.23
WHAT
1.07
WHAT
0.95
whats
0.88
whats
0.83
hvad
0.76
hva
0.75
Activations Density 0.047%