INDEX
Explanations
expressions of uncertainty or confusion
New Auto-Interp
Negative Logits
also
-0.62
SOUNDBITE
-0.60
zwar
-0.60
Els
-0.60
@}
-0.58
Fiske
-0.56
tså
-0.52
גם
-0.50
Els
-0.50
thus
-0.50
POSITIVE LOGITS
Simplemente
0.97
Просто
0.93
Jefus
0.85
juſt
0.85
ſhe
0.80
purpoſe
0.80
ſtate
0.80
lamang
0.79
itſelf
0.79
Einfach
0.78
Activations Density 0.256%