INDEX
Explanations
phrases and terms that relate to various forms of importance and classification in texts
New Auto-Interp
Negative Logits
cia
-0.17
apy
-0.16
pio
-0.16
assi
-0.15
alls
-0.15
ÑĮогоднÑĸ
-0.15
ocate
-0.15
fore
-0.14
alling
-0.14
agn
-0.14
POSITIVE LOGITS
S
0.28
.getS
0.17
abstraction
0.17
erton
0.16
BT
0.16
nict
0.15
BT
0.15
immel
0.15
Äįas
0.15
anto
0.15
Activations Density 0.030%