INDEX
Explanations
instances of the word "of"
phrases that refer to different "versions of" something
New Auto-Interp
Negative Logits
ibling
-0.82
resy
-0.79
oji
-0.76
teasp
-0.76
urers
-0.75
ktop
-0.72
rences
-0.71
ering
-0.69
entimes
-0.67
erity
-0.66
POSITIVE LOGITS
course
0.73
history
0.62
thood
0.62
reality
0.59
Meier
0.59
Horton
0.58
Chicken
0.58
theirs
0.57
Christianity
0.57
Divinity
0.56
Activations Density 0.105%