INDEX
Explanations
references to GitHub repositories and associated metadata
New Auto-Interp
Negative Logits
smooth
-0.51
đồ
-0.50
Eksterne
-0.50
smooth
-0.49
ecg
-0.49
smo
-0.49
sweet
-0.48
pośred
-0.47
↵
-0.47
↵↵
-0.47
POSITIVE LOGITS
KP
1.28
KF
1.15
KC
1.15
Ks
1.15
KT
1.15
KG
1.13
kp
1.12
KB
1.12
KM
1.12
KS
1.12
Activations Density 0.704%