INDEX
Explanations
references to the Star Wars universe and its characters
New Auto-Interp
Negative Logits
ujednoznacz
-0.56
bridegroom
-0.51
}],
-0.50
Daylight
-0.49
étudi
-0.48
hôtes
-0.48
RegressionTest
-0.48
hosts
-0.47
riwal
-0.46
bride
-0.46
POSITIVE LOGITS
Jedi
0.81
Jedi
0.73
jedi
0.72
joaat
0.70
Kenobi
0.69
Mandalorian
0.68
Skywalker
0.68
soka
0.67
BorderRadius
0.66
Yoda
0.65
Activations Density 0.824%