INDEX
Explanations
phrases indicating purpose or intention
New Auto-Interp
Negative Logits
Jefus
-0.95
poffible
-0.89
ſelf
-0.88
myſelf
-0.88
themſelves
-0.87
ſeveral
-0.86
Anſ
-0.86
Theſe
-0.83
Monfieur
-0.83
chofe
-0.80
POSITIVE LOGITS
be
0.67
re
0.57
}}</
0.56
daß
0.55
dynam
0.55
)':
0.55
ולה
0.53
'><
0.53
tidae
0.53
}{#0.52
Activations Density 0.172%