INDEX
Explanations
verbs indicating actions, processes, or changes
New Auto-Interp
Negative Logits
المشاركات
-0.61
Couldn
-0.57
<eos>
-0.56
'
-0.56
-
-0.53
couldn
-0.52
would
-0.49
which
-0.48
was
-0.47
would
-0.46
POSITIVE LOGITS
fhew
1.01
Theſe
0.99
AddTagHelper
0.98
themſelves
0.95
chofe
0.92
uſe
0.91
becauſe
0.90
foncé
0.90
myſelf
0.89
ſtand
0.86
Activations Density 0.436%