INDEX
Explanations
references to a specific news source or website
New Auto-Interp
Negative Logits
Micro
-0.65
forcing
-0.62
partition
-0.61
insert
-0.61
plate
-0.59
evolution
-0.59
punishing
-0.59
ļéĨĴ
-0.57
indec
-0.57
atomic
-0.56
POSITIVE LOGITS
ws
4.45
wed
2.37
wt
1.45
wd
1.43
wn
1.41
wy
1.40
wl
1.40
wic
1.37
wi
1.35
WS
1.34
Activations Density 0.014%