INDEX
Explanations
words related to legal actions and specific names, including surnames
New Auto-Interp
Negative Logits
ADRA
-0.72
raints
-0.67
popular
-0.64
liest
-0.63
ulating
-0.62
kok
-0.59
UAL
-0.59
quo
-0.59
SIM
-0.58
abis
-0.58
POSITIVE LOGITS
istics
0.96
oad
0.95
gren
0.92
ogue
0.85
oaded
0.83
ibrary
0.82
iffe
0.81
anguage
0.78
gage
0.76
agic
0.75
Activations Density 12.182%