INDEX
Explanations
phrases related to illegal or unregulated activities
New Auto-Interp
Negative Logits
itto
-0.18
linear
-0.17
Linear
-0.15
linear
-0.15
Linear
-0.15
436
-0.14
ymi
-0.14
vanced
-0.14
sgi
-0.14
bbe
-0.14
POSITIVE LOGITS
informal
0.34
amateur
0.34
amateurs
0.32
DIY
0.31
Amateur
0.30
homemade
0.27
unofficial
0.26
æ°ij
0.26
makeshift
0.25
inform
0.25
Activations Density 0.179%