INDEX
Explanations
phrases related to finality or conclusions
phrases expressing difficulty or challenges
New Auto-Interp
Negative Logits
condem
-0.79
conduc
-0.74
clitor
-0.71
Instr
-0.71
ulators
-0.69
raints
-0.69
hemor
-0.68
organisers
-0.67
aroused
-0.67
lapt
-0.67
POSITIVE LOGITS
âĶĢâĶĢ
1.16
ï¸ı
1.06
âĶĢâĶĢâĶĢâĶĢ
0.98
×Ķ
0.96
âķIJâķIJ
0.88
λ
0.85
----------------------------------------------------------------
0.85
×ķ
0.84
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
0.83
ever
0.82
Activations Density 0.233%