INDEX
Explanations
references to famous literary works or passages
in languages other than English
modal verbs and multi-language fragments
New Auto-Interp
Negative Logits
Interestingly
-0.54
interacted
-0.53
strukt
-0.53
Ideally
-0.53
Interestingly
-0.53
iconic
-0.52
Ideally
-0.51
komplett
-0.50
specifik
-0.50
fokus
-0.50
POSITIVE LOGITS
,—
0.69
semblables
0.65
câte
0.64
mijne
0.63
[?]
0.62
fabbrica
0.62
kuin
0.61
împre
0.61
szív
0.61
medesimo
0.61
Activations Density 0.010%