INDEX
Explanations
words related to criticism and analysis, particularly in the context of media, politics, and entertainment
New Auto-Interp
Negative Logits
lez
-0.66
eks
-0.66
Tomorrow
-0.63
hello
-0.63
Tonight
-0.63
Borough
-0.62
Republic
-0.61
oj
-0.60
abuse
-0.59
Hoy
-0.59
POSITIVE LOGITS
systematically
0.87
remarkably
0.86
geographically
0.83
extraordinarily
0.83
fundamentally
0.82
incredibly
0.81
uniformly
0.81
consistently
0.81
disproportionately
0.80
relatively
0.80
Activations Density 0.462%