INDEX
Explanations
adjectives that describe intensity or degree
New Auto-Interp
Negative Logits
,
-0.46
<eos>
-0.45
he
-0.44
&&
-0.43
године
-0.42
so
-0.42
(
-0.40
ResumeLayout
-0.39
So
-0.38
!
-0.38
POSITIVE LOGITS
myſelf
1.06
^(@)
1.00
raiſ
1.00
leſs
0.99
Monfieur
0.99
reaſon
0.97
pleaſure
0.94
ſelf
0.93
Reſ
0.92
themſelves
0.92
Activations Density 0.700%