INDEX
Explanations
words related to the concept of belief or opinion
the presence of the word "there" in various contexts
New Auto-Interp
Negative Logits
EA
-0.63
Armored
-0.60
Dish
-0.57
Khe
-0.57
elta
-0.55
Maw
-0.54
ointed
-0.54
Cum
-0.53
Greenwich
-0.53
SEE
-0.53
POSITIVE LOGITS
abouts
1.51
upon
1.17
fore
1.03
shouldn
0.97
isn
0.95
ain
0.95
wasn
0.95
weren
0.93
aren
0.93
after
0.91
Activations Density 0.124%