INDEX
Explanations
mentions of specific political figures
New Auto-Interp
Negative Logits
ThroughAttribute
-0.46
InstrumentedTest
-0.46
оригіналу
-0.41
Passer
-0.41
Mate
-0.39
xfc
-0.39
AVIA
-0.39
makedirs
-0.38
ÕES
-0.38
lad
-0.38
POSITIVE LOGITS
Obrador
0.71
WithIOException
0.62
0.55
lusconi
0.50
phosa
0.49
mergeFrom
0.49
Chomsky
0.49
Gorbachev
0.49
Wikimedijinoj
0.48
celotti
0.47
Activations Density 0.799%