INDEX
Explanations
phrases related to injustice and social commentary
New Auto-Interp
Negative Logits
boro
-0.16
ypress
-0.15
OffsetTable
-0.15
uyá»ĩn
-0.14
otti
-0.14
onna
-0.14
itel
-0.14
wich
-0.13
elves
-0.13
owski
-0.13
POSITIVE LOGITS
590
0.16
ÏĢοÏį
0.15
arena
0.15
ialis
0.14
lint
0.14
962
0.14
ausible
0.14
ntl
0.14
ury
0.14
645
0.13
Activations Density 0.106%