INDEX
Explanations
repeated references to the same concepts or entities
New Auto-Interp
Negative Logits
Савезне
-0.55
ftagPool
-0.52
s
-0.48
good
-0.47
'][]
-0.47
basic
-0.47
\&
-0.46
etc
-0.46
is
-0.45
in
-0.45
POSITIVE LOGITS
ſche
0.95
tartalomajánló
0.91
ſtate
0.89
myſelf
0.88
Efq
0.86
ſy
0.86
fubject
0.85
itſelf
0.84
raiſ
0.84
theſe
0.83
Activations Density 0.563%