INDEX
Explanations
technical terms and code-related language
New Auto-Interp
Negative Logits
kasarigan
-0.82
كومونز
-0.82
étoient
-0.80
indígen
-0.78
avoient
-0.75
wikipagina
-0.75
ſtate
-0.75
pleaſure
-0.74
auroit
-0.73
ब्रेकडाउन
-0.72
POSITIVE LOGITS
1
0.64
2
0.62
,
0.61
7
0.59
4
0.59
↵↵
0.58
↵↵↵
0.57
8
0.57
3
0.57
5
0.57
Activations Density 7.620%