INDEX
Explanations
phrases discussing actions, decisions, or events with significant impact or consequences
affirmative statements about actions or conditions
New Auto-Interp
Negative Logits
Gur
-0.70
Naj
-0.69
Cyr
-0.65
Nept
-0.65
Cher
-0.64
Pats
-0.62
Sly
-0.62
Swed
-0.61
Mush
-0.60
Heal
-0.60
POSITIVE LOGITS
ushima
0.86
senal
0.83
nevertheless
0.81
nonetheless
0.76
ategory
0.75
akin
0.73
SourceFile
0.71
ADRA
0.71
¿½
0.70
ŃĶ
0.70
Activations Density 0.669%