INDEX
Explanations
instances of the word "review" and its variants
New Auto-Interp
Negative Logits
któ
-0.71
Piet
-0.64
Schulte
-0.62
Alba
-0.62
Dodson
-0.61
ausz
-0.60
Nat
-0.60
Eb
-0.58
getY
-0.58
am
-0.58
POSITIVE LOGITS
Review
1.65
review
1.60
REVIEW
1.58
Reviews
1.57
reviews
1.57
review
1.57
Review
1.56
Reviews
1.54
REVIEW
1.52
reviewer
1.45
Activations Density 0.074%