INDEX
Explanations
references to roommates and dormitories
references to roommates and dorm-related experiences
New Auto-Interp
Negative Logits
Spear
-0.79
arte
-0.78
Chain
-0.74
ifact
-0.72
Force
-0.71
ingers
-0.69
stones
-0.69
asive
-0.69
force
-0.68
Insight
-0.67
POSITIVE LOGITS
roomm
3.10
roommate
3.01
dorm
2.57
classmate
1.49
classmates
1.29
cowork
1.28
coworkers
1.25
motel
1.20
squirrel
1.16
bunk
1.10
Activations Density 0.041%