INDEX
Explanations
references to political topics or entities
New Auto-Interp
Negative Logits
-0.57
i
-0.50
Autowired
-0.49
p
-0.47
p
-0.46
g
-0.44
-0.43
ra
-0.43
l
-0.43
van
-0.42
POSITIVE LOGITS
Roskov
1.18
SharedCtor
1.07
myſelf
1.06
ſtate
1.03
pleaſure
1.01
itſelf
0.98
MenuView
0.98
fubject
0.97
ſelf
0.97
juſ
0.96
Activations Density 0.232%