INDEX
Explanations
words related to common or usual things
the concept of "typical" or "normal" characteristics
New Auto-Interp
Negative Logits
heed
-0.76
appointed
-0.72
enced
-0.70
acus
-0.69
nuts
-0.66
rencies
-0.65
mentation
-0.65
Tickets
-0.65
aughter
-0.64
arching
-0.64
POSITIVE LOGITS
fare
0.86
americ
0.82
weekday
0.81
sized
0.81
deviation
0.81
typ
0.80
American
0.79
ized
0.78
Western
0.78
western
0.77
Activations Density 0.072%