INDEX
Explanations
specific phrases that denote the beginning of a new section or topic in a text
New Auto-Interp
Negative Logits
Marshaller
-0.63
TestBed
-0.60
bitField
-0.57
$")
-0.56
%")
-0.56
TagMode
-0.55
Parcelable
-0.53
fevere
-0.52
!")
-0.51
$",
-0.51
POSITIVE LOGITS
racism
0.74
Racism
0.73
racist
0.70
gender
0.70
racial
0.69
Racism
0.66
gender
0.65
racially
0.64
racism
0.63
Racial
0.59
Activations Density 0.475%