INDEX
Explanations
phrases indicating evaluations or ratings of quality
New Auto-Interp
Negative Logits
↑
-0.57
int
-0.48
<em>
-0.47
T
-0.46
########.
-0.46
stu
-0.45
-0.45
مصادر
-0.45
-0.45
i
-0.44
POSITIVE LOGITS
purpoſe
1.05
myſelf
1.04
houſe
1.04
faſt
0.97
unſ
0.97
reaſon
0.96
ſame
0.94
Houſe
0.94
ſelf
0.94
ſind
0.92
Activations Density 0.211%