INDEX
Explanations
phrases indicating plausibility or potential claims
plausible claims or predictions
New Auto-Interp
Negative Logits
Tikang
-0.57
Comprometido
-0.53
незавершена
-0.51
ChildScrollView
-0.51
CreateTagHelper
-0.50
Camila
-0.50
efois
-0.49
Erfolge
-0.48
comings
-0.48
źródło
-0.48
POSITIVE LOGITS
א
1.44
א
1.23
הא
0.99
הא
0.73
מא
0.72
ָא
0.62
וא
0.58
בא
0.54
שא
0.54
อ
0.51
Activations Density 0.003%