INDEX
Explanations
phrases analyzing perspectives or views on specific issues
New Auto-Interp
Negative Logits
ly
-0.54
Ste
-0.49
تانيه
-0.48
’
-0.47
or
-0.44
dai
-0.44
Che
-0.44
das
-0.44
/
-0.43
and
-0.42
POSITIVE LOGITS
diſt
0.75
ſal
0.75
feroit
0.74
ſtate
0.73
uſ
0.72
nmax
0.72
ſur
0.71
ſtand
0.70
ſtill
0.70
eſſ
0.69
Activations Density 0.257%