INDEX
Explanations
terms associated with legality and illegal activities
New Auto-Interp
Negative Logits
mijne
-0.92
zijne
-0.91
avoient
-0.89
idéia
-0.86
Bewußt
-0.85
ambién
-0.82
étoient
-0.80
Italij
-0.80
enfans
-0.78
miniaturka
-0.78
POSITIVE LOGITS
illegal
0.77
legal
0.67
index
0.60
index
0.59
Legal
0.56
Il
0.56
0.54
↵↵
0.54
Index
0.54
uuid
0.52
Activations Density 0.447%