INDEX
Explanations
phrases indicating time or condition, such as "when," "if," or "once."
phrases expressing self-awareness and personal actions
New Auto-Interp
Negative Logits
WER
-0.72
oner
-0.70
bourg
-0.68
Kem
-0.66
Ain
-0.63
vision
-0.63
dor
-0.63
far
-0.63
âĶľ
-0.62
Mong
-0.61
POSITIVE LOGITS
idth
0.74
collide
0.72
mble
0.72
roup
0.71
confronted
0.70
Chimera
0.67
entimes
0.65
ceases
0.65
otherwise
0.64
erva
0.61
Activations Density 0.337%