INDEX
Explanations
arguments or claims related to economics and political discourse
New Auto-Interp
Negative Logits
guard
-0.16
Guard
-0.15
even
-0.15
guard
-0.15
guarding
-0.15
çĶļèĩ³
-0.14
_guard
-0.14
even
-0.14
.guard
-0.14
iyon
-0.14
POSITIVE LOGITS
directly
0.35
direct
0.32
direct
0.32
.direct
0.29
DIRECT
0.27
Direct
0.27
Direct
0.27
_direct
0.26
diret
0.25
direkt
0.24
Activations Density 0.104%