INDEX
Explanations
phrases indicating comparison or measurement thresholds
New Auto-Interp
Negative Logits
752
-0.16
EDIUM
-0.15
yw
-0.15
closer
-0.14
uild
-0.14
CAPITAL
-0.14
amedi
-0.14
unte
-0.14
ilon
-0.14
rix
-0.14
POSITIVE LOGITS
(<
0.23
ONS
0.18
istrovstvÃŃ
0.18
lings
0.15
_iff
0.15
orama
0.14
OTAL
0.14
ling
0.14
(=)
0.14
ever
0.14
Activations Density 0.038%