INDEX
Explanations
statements reflecting opinions or evaluations about entities or situations
New Auto-Interp
Negative Logits
betweenstory
-0.82
hyrchwyd
-0.76
pleaſure
-0.74
oprot
-0.68
houſe
-0.68
ſmall
-0.68
ſever
-0.67
Majefty
-0.66
occaf
-0.65
Shakspeare
-0.65
POSITIVE LOGITS
IRQn
0.58
WriteBarrier
0.55
QUI
0.54
fast
0.51
cely
0.48
되지
0.48
كومونز
0.48
szóci
0.48
sé
0.48
angan
0.47
Activations Density 0.372%