INDEX
Explanations
ratings given to various entities or items
instances of ratings or scores
New Auto-Interp
Negative Logits
ansson
-0.87
adr
-0.82
working
-0.75
Alz
-0.75
coming
-0.73
ocrates
-0.73
filled
-0.66
ilus
-0.65
Tik
-0.65
ortun
-0.65
POSITIVE LOGITS
rating
1.24
ratings
1.18
Ratings
1.08
Rating
1.00
Rating
0.98
rated
0.94
âĺħâĺħ
0.87
rating
0.79
score
0.72
Beer
0.72
Activations Density 0.013%