INDEX
Explanations
goals, problems, and eye contact
New Auto-Interp
Negative Logits
ين
0.91
ח
0.87
ان
0.82
at
0.78
ون
0.78
на
0.77
د
0.75
ק
0.75
اب
0.74
in
0.73
POSITIVE LOGITS
↵↵
0.68
-
0.67
Öncelikle
0.63
У
0.57
,
0.56
ありますが
0.55
),
0.55
),
0.54
},
0.54
Α
0.52
Activations Density 0.425%