INDEX
Explanations
references to social demographics and representation
after single letters
programming and groups
New Auto-Interp
Negative Logits
CommandType
-0.59
mentre
-0.59
versus
-0.57
Versus
-0.56
vs
-0.56
Whereas
-0.54
不像
-0.54
متعلقه
-0.53
betweenstory
-0.53
Vs
-0.53
POSITIVE LOGITS
itſelf
0.69
purpoſe
0.65
наоборот
0.64
pleaſure
0.63
defire
0.63
houſe
0.63
diſt
0.63
neceff
0.63
Reſ
0.61
ones
0.61
Activations Density 0.772%