INDEX
Explanations
instances where reasons or justifications are being stated
New Auto-Interp
Negative Logits
ształ
-0.47
utafitiHapana
-0.47
Vil
-0.44
arim
-0.42
оп
-0.42
Am
-0.42
过
-0.42
過
-0.41
An
-0.39
']")
-0.39
POSITIVE LOGITS
verwijspagina
0.94
RegistryLite
0.93
perché
0.79
kasarigan
0.78
porque
0.77
porque
0.76
because
0.75
之所以
0.74
becauſe
0.74
Çünkü
0.72
Activations Density 0.371%